On Jan 5, 2012, at 6:41 PM, Jed Brown wrote: > On Thu, Jan 5, 2012 at 17:13, Ravi Kannan <rxk at cfdrc.com> wrote: > Files are attached. > > Could you try attaching a debugger to get stack traces? > > It is reducing to a smaller communicator for the coarse level. The processes > are likely both hung later in gamg.c:createLevel(). Mark, the appearance is > that all procs that call MPI_Comm_create() are also doing things on the newly > created communicator, even though it will be MPI_COMM_NULL on processes that > are not part of the subgroup. Also, I'm skeptical that you can get correct > results with MatPartitioningSetAdjacency(mpart,adj) when mpart and adj are on > different communicators. Those other rows of adj are not moved by > MatPartitioningApply_Parmetis().
This is scary having two communicators running around but the processors that are dropped out of the new communicator have no rows -- that is why they are dropped out. There are several logical paths through this code and I fixed a bug looks like this one a few weeks ago, but looks like you have a configuration that I have not debugged yet. It would be very useful if you could give me the lines that each processor is hung on. Mark > > I must be confused about what is actually happening. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120105/2c2d0112/attachment.html>