It is actually my mistake. Although every rank should call CreateFromCellList, only 1 proc needs to read in the mesh, and other just create empty mesh. It works fine now.
Thanks, Josh Matthew Knepley <[email protected]> 於 2018年9月29日 週六 上午11:00寫道: > On Fri, Sep 28, 2018 at 9:57 PM Josh L <[email protected]> wrote: > >> Hi, >> >> I am implementing DMPlex to handle my mesh in parallel, but I am getting >> segmentation fault during DMPlexDistribute. >> i am testing with 2 processors on a toy mesh that only has 4 quad >> elements. >> >> the code is as following >> CALL DMPlexCreateFromCellList >> Check mesh topology. Face, skeleton symmetry >> CALL DMPlexDistribute (dmserial,0,PETSC_NULL_SF,dmmpi,ierr) . >> (segmentation fault on rank #1 here) >> > > Crud, I was missing a check that the overlap is different on different > procs. This is now fixed. > > Second it might be that your Fortran ints are not the same as PetscInt. > Try declaring it > > PetscInt overlap = 0 > > and then passing 'overlap' instead. > > Thanks, > > Matt > > >> I trace it back to the external library "Chaco" called by PETSc to >> partition the mesh. >> >> The following is the stack >> >> For rank #1 >> >> #10 main () at /work/03691/yslo/stampede2/linear/rewrite/src/main.F90:18 >> (at 0x000000000040569d) >> #9 createmesh (rank=1, nsize=2, dmmpi=...) at >> /work/03691/yslo/stampede2/linear/rewrite/src/createmesh.F90:107 (at >> 0x000000000040523b) >> #8 dmplexdistribute_ (dm=0x320, overlap=0x1, sf=0x7fffffff4480, >> dmParallel=0x2aaab78a585d, ierr=0x1) at >> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/ftn-custom/zplexdistribute.c:15 >> (at 0x00002aaaac08334e) >> #7 DMPlexDistribute (dm=0x320, overlap=1, sf=0x7fffffff4480, >> dmParallel=0x2aaab78a585d) at >> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexdistribute.c:1664 >> (at 0x00002aaaabd12d08) >> #6 PetscPartitionerPartition (part=0x320, dm=0x1, >> partSection=0x7fffffff4480, partition=0x2aaab78a585d) at >> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:675 >> (at 0x00002aaaabcfe92c) >> #5 PetscPartitionerPartition_Chaco (part=0x320, dm=0x1, nparts=-48000, >> numVertices=-1215670179, start=0x1, adjacency=0x22, partSection=0x0, >> partition=0x772120) at >> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:1279 >> (at 0x00002aaaabd03dfd) >> #4 interface.Z (nvtxs=800, start=0x1, adjacency=0x7fffffff4480, >> vwgts=0x2aaab78a585d, ewgts=0x1, x=0x22, y=0x0, z=0x0, outassignname=0x0, >> outfilename=0x0, assignment=0x76ff90, architecture=1, ndims_tot=0, >> mesh_dims=0x7fffffff7038, goal=0x0, global_method=1, local_method=1, >> rqi_flag=0, vmax=200, ndims=1, eigtol=2.1137067449068142e-314, >> seed=123636512) at >> /tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/main/interface.c:206 >> (at 0x00002aaaad7aaf70) >> #3 submain.Z (graph=0x320, nvtxs=1, nedges=-48000, >> using_vwgts=-1215670179, using_ewgts=1, igeom=34, coords=0x0, >> outassignname=0x0, outfilename=0x0, assignment=0x76ff8e, goal=0x7739a0, >> architecture=1, ndims_tot=0, mesh_dims=0x7fffffff7038, global_method=1, >> local_method=1, rqi_flag=0, vmax=200, ndims=1, >> eigtol=2.1137067449068142e-314, seed=123636512) at >> /tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/submain/submain.c:151 >> (at 0x00002aaaad7ae52e) >> #2 check_input.Z (graph=0x320, nvtxs=1, nedges=-48000, igeom=-1215670179, >> coords=0x1, graphname=0x22 <error: Cannot access memory at address 0x22>, >> assignment=0x76ff8e, goal=0x7739a0, architecture=1, ndims_tot=0, >> mesh_dims=0x7fffffff7038, global_method=1, local_method=1, rqi_flag=0, >> vmax=0x7fffffff47a8, ndims=1, eigtol=2.1137067449068142e-314) at >> /tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/input/check_input.c:56 >> (at 0x00002aaaad7c6ed1) >> #1 check_graph (graph=0x320, nvtxs=1, nedges=-48000) at >> /tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/graph/check_graph.c:90 >> (at 0x00002aaaad8204d3) >> #0 is_an_edge (vertex=0x320, v2=1, weight2=0x7fffffff4480) at >> /tmp/petsc-build/externalpackages/skylake/skylake/Chaco-2.2-p2/code/graph/check_graph.c:134 >> (at 0x00002aaaad8206d3) >> >> For rank #0 >> >> #18 main () at /work/03691/yslo/stampede2/linear/rewrite/src/main.F90:18 >> (at 0x000000000040569d) >> #17 createmesh (rank=0, nsize=2, dmmpi=...) at >> /work/03691/yslo/stampede2/linear/rewrite/src/createmesh.F90:107 (at >> 0x000000000040523b) >> #16 dmplexdistribute_ (dm=0x65f300, overlap=0x0, sf=0x2aaab4c8698c, >> dmParallel=0xffffffffffffffff, ierr=0x0) at >> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/ftn-custom/zplexdistribute.c:15 >> (at 0x00002aaaac08334e) >> #15 DMPlexDistribute (dm=0x65f300, overlap=0, sf=0x2aaab4c8698c, >> dmParallel=0xffffffffffffffff) at >> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexdistribute.c:1664 >> (at 0x00002aaaabd12d08) >> #14 PetscPartitionerPartition (part=0x65f300, dm=0x0, >> partSection=0x2aaab4c8698c, partition=0xffffffffffffffff) at >> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:675 >> (at 0x00002aaaabcfe92c) >> #13 PetscPartitionerPartition_Chaco (part=0x65f300, dm=0x0, >> nparts=-1261934196, numVertices=-1, start=0x0, adjacency=0x0, >> partSection=0x7726c0, partition=0x7fffffff71a8) at >> /home1/apps/intel17/impi17_0/petsc/3.9/src/dm/impls/plex/plexpartition.c:1314 >> (at 0x00002aaaabd04029) >> #12 ISCreateGeneral (comm=6681344, n=0, idx=0x2aaab4c8698c, >> mode=(PETSC_OWN_POINTER | PETSC_USE_POINTER | unknown: 4294967292), is=0x0) >> at >> /home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/is/impls/general/general.c:671 >> (at 0x00002aaaab0ea94e) >> #11 ISGeneralSetIndices (is=0x65f300, n=0, idx=0x2aaab4c8698c, >> mode=(PETSC_OWN_POINTER | PETSC_USE_POINTER | unknown: 4294967292)) at >> /home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/is/impls/general/general.c:698 >> (at 0x00002aaaab0eaa77) >> #10 ISGeneralSetIndices_General.Z (is=0x65f300, n=0, idx=0x2aaab4c8698c, >> mode=(PETSC_OWN_POINTER | PETSC_USE_POINTER | unknown: 4294967292)) at >> /home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/is/impls/general/general.c:712 >> (at 0x00002aaaab0ef567) >> #9 PetscLayoutSetUp (map=0x65f300) at >> /home1/apps/intel17/impi17_0/petsc/3.9/src/vec/is/utils/pmap.c:137 (at >> 0x00002aaaab0c6d52) >> #8 PetscSplitOwnership (comm=6681344, n=0x0, N=0x2aaab4c8698c) at >> /home1/apps/intel17/impi17_0/petsc/3.9/src/sys/utils/psplit.c:80 (at >> 0x00002aaaaaeb7b00) >> #7 PMPI_Allreduce (sendbuf=0x65f300, recvbuf=0x0, count=-1261934196, >> datatype=-1, op=0, comm=0) at >> /tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpi/coll/allreduce.c:1395 >> (at 0x00002aaab40966e6) >> #6 MPIR_Allreduce_intra (sendbuf=0x65f300, recvbuf=0x0, >> count=-1261934196, datatype=-1, op=0, comm_ptr=0x0, errflag=0x7fffffff4798) >> at >> /tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpi/coll/allreduce.c:339 >> (at 0x00002aaab409307b) >> #5 MPIR_Allreduce_shm_generic (sendbuf=<optimized out>, >> recvbuf=<optimized out>, count=<optimized out>, datatype=<optimized out>, >> op=<optimized out>, comm_ptr=<optimized out>, errflag=<optimized out>, >> kind=1476395011) at >> /tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpi/coll/allreduce.c:1137 >> (at 0x00002aaab409307b) >> #4 I_MPI_COLL_SHM_KNARY_REDUCE (node_comm_ptr=<optimized out>, >> root=<optimized out>, localbuf=<optimized out>, sendbuf=<optimized out>, >> recvbuf=<optimized out>, count=<optimized out>, datatype=<optimized out>, >> op=<optimized out>, errflag=<optimized out>, knomial_factor=<optimized >> out>) at >> /tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/I_MPI/include/shm_coll_templating.h:1090 >> (at 0x00002aaab409307b) >> #3 I_MPI_COLL_SHM_GENERIC_GATHER_REDUCE..1 (node_comm_ptr=0x65f300, >> root=0, is_reduce=-1261934196, localbuf=0xffffffffffffffff, sendbuf=0x0, >> recvbuf=0x0, count=1, datatype=1275069445, op=1476395011, >> errflag=0x7fffffff4798, knomial_factor=4, algo_type=2) at >> /tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/I_MPI/include/shm_coll_templating.h:558 >> (at 0x00002aaab408f3a9) >> #2 I_MPI_memcpy (destination=<optimized out>, source=0x0, size=<optimized >> out>) at >> /tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/I_MPI/include/shm_coll_templating.h:749 >> (at 0x00002aaab408f3a9) >> #1 PMPIDI_CH3I_Progress (progress_state=0x65f300, is_blocking=0) at >> /tmp/mpi.xtmpdir.7b663e0dc22b2304e487307e376dc132.14974_32e/mpi.32e.ww14.20170405/dev/x86_64/release_mt/../../src/mpid/ch3/channels/nemesis/src/ch3_progress.c:981 >> (at 0x00002aaab40e85a6) >> #0 sched_yield () from /lib64/libc.so.6 (at 0x00002aaab7898e47) >> >> >> Any idea why this is happening? the overlap is set to 0, but it is 1 on >> rank #1 >> >> >> Is there anyway to know the cell and vertex number distributed on each >> processors? >> my old code partitions the mesh with Metis, and I always output cell data >> that shows the rank number on each cell, so I can visualize how mesh is >> partitioned. >> It is not necessary, but is there anyway to get it in DMPlex? >> DMPlexGetCellNumbering might work, but it fails the linking. In fact, >> many function under developer category fails the linking. >> >> >> Thanks, >> Yu-Sheng >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <http://www.cse.buffalo.edu/~knepley/> >
