Hi Kyle et al.

below are stack traces where PV is hung. I'm stumped by this, and can get no foothold. I still have one chance if we can get valgrind to run with MPI on nautilus. But it's a long shot, valgrinding pvbatch on my local system throws many hundreds of errors. I'm not sure which of these are valid reports.

PV 3.14.1 doesn't hang in pvbatch, so I wondering if anyone knows of a change in 3.98 that may account for the new hang?

Burlen

rank 0
#0 0x00002b0762b3f590 in gru_get_next_message () from /usr/lib64/libgru.so.0
#1  0x00002b073a2f4bd2 in MPI_SGI_grudev_progress () at grudev.c:1780
#2  0x00002b073a31cc25 in MPI_SGI_progress_devices () at progress.c:93
#3  MPI_SGI_progress () at progress.c:207
#4  0x00002b073a3244eb in MPI_SGI_request_finalize () at req.c:1548
#5  0x00002b073a2b8bee in MPI_SGI_finalize () at adi.c:667
#6  0x00002b073a2e3c04 in PMPI_Finalize () at finalize.c:27
#7 0x00002b073969d96f in vtkProcessModule::Finalize () at /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ClientServerCore/Core/vtkProcessModule.cxx:229 #8 0x00002b0737bb0f9e in vtkInitializationHelper::Finalize () at /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ServerManager/SMApplication/vtkInitializationHelper.cxx:145 #9 0x0000000000403c50 in ParaViewPython::Run (processType=4, argc=2, argv=0x7fff06195c88) at /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvpython.h:124 #10 0x0000000000403cd5 in main (argc=2, argv=0x7fff06195c88) at /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvbatch.cxx:21

rank 1
#0 0x00002b07391bde70 in __nanosleep_nocancel () from /lib64/libpthread.so.0 #1 0x00002b073a32c898 in MPI_SGI_millisleep (milliseconds=<value optimized out>) at sleep.c:34 #2 0x00002b073a326365 in MPI_SGI_slow_request_wait (request=0x7fff061959f8, status=0x7fff061959d0, set=0x7fff061959f4, gen_rc=0x7fff061959f0) at req.c:1460
#3  0x00002b073a2c6ef3 in MPI_SGI_slow_barrier (comm=1) at barrier.c:275
#4  0x00002b073a2b8bf8 in MPI_SGI_finalize () at adi.c:671
#5  0x00002b073a2e3c04 in PMPI_Finalize () at finalize.c:27
#6 0x00002b073969d96f in vtkProcessModule::Finalize () at /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ClientServerCore/Core/vtkProcessModule.cxx:229 #7 0x00002b0737bb0f9e in vtkInitializationHelper::Finalize () at /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ServerManager/SMApplication/vtkInitializationHelper.cxx:145 #8 0x0000000000403c50 in ParaViewPython::Run (processType=4, argc=2, argv=0x7fff06195c88) at /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvpython.h:124 #9 0x0000000000403cd5 in main (argc=2, argv=0x7fff06195c88) at /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvbatch.cxx:21


On 12/04/2012 05:15 PM, Burlen Loring wrote:
Hi Kyle,

I was wrong about MPI_Finalize being invoked twice, I had miss read the code. I'm not sure why pvbatch is hanging in MPI_Finalize on Nautilus. I haven't been able to find anything in the debugger. This is new for 3.98.

Burlen

On 12/03/2012 07:36 AM, Kyle Lutz wrote:
Hi Burlen,

On Thu, Nov 29, 2012 at 1:27 PM, Burlen Loring<[email protected]>  wrote:
it looks like pvserver is also impacted, hanging after the gui disconnects.


On 11/28/2012 12:53 PM, Burlen Loring wrote:
Hi All,

some parallel tests have been failing for some time on Nautilus.
http://open.cdash.org/viewTest.php?onlyfailed&buildid=2684614

There are MPI calls made after finalize which cause deadlock issues on SGI MPT. It affects pvbatch for sure. The following snip-it shows the bug, and
bug report here: http://paraview.org/Bug/view.php?id=13690


//----------------------------------------------------------------------------
bool vtkProcessModule::Finalize()
{

   ...

   vtkProcessModule::GlobalController->Finalize(1);<-------mpi_finalize
called here
This shouldn't be calling MPI_Finalize() as the finalizedExternally
argument is 1 and in vtkMPIController::Finalize():

     if (finalizedExternally == 0)
       {
       MPI_Finalize();
       }

So my guess is that it's being invoked elsewhere.

   ...

#ifdef PARAVIEW_USE_MPI
   if (vtkProcessModule::FinalizeMPI)
     {
MPI_Barrier(MPI_COMM_WORLD);<-------------------------barrier after
mpi_finalize
     MPI_Finalize();<--------------------------------------second
mpi_finalize
     }
#endif
I've made a patch which should prevent this second of code from ever
being called twice by setting the FinalizeMPI flag to false after
calling MPI_Finalize(). Can you take a look here:
http://review.source.kitware.com/#/t/1808/ and let me know if that
helps the issue.

Otherwise, would you be able to set a breakpoint on MPI_Finalize() and
get a backtrace of where it gets invoked for the second time? That
would be very helpful in tracking down the problem.

Thanks,
Kyle


_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview

Reply via email to