Hi Kyle et al.
below are stack traces where PV is hung. I'm stumped by this, and can
get no foothold. I still have one chance if we can get valgrind to run
with MPI on nautilus. But it's a long shot, valgrinding pvbatch on my
local system throws many hundreds of errors. I'm not sure which of these
are valid reports.
PV 3.14.1 doesn't hang in pvbatch, so I wondering if anyone knows of a
change in 3.98 that may account for the new hang?
Burlen
rank 0
#0 0x00002b0762b3f590 in gru_get_next_message () from
/usr/lib64/libgru.so.0
#1 0x00002b073a2f4bd2 in MPI_SGI_grudev_progress () at grudev.c:1780
#2 0x00002b073a31cc25 in MPI_SGI_progress_devices () at progress.c:93
#3 MPI_SGI_progress () at progress.c:207
#4 0x00002b073a3244eb in MPI_SGI_request_finalize () at req.c:1548
#5 0x00002b073a2b8bee in MPI_SGI_finalize () at adi.c:667
#6 0x00002b073a2e3c04 in PMPI_Finalize () at finalize.c:27
#7 0x00002b073969d96f in vtkProcessModule::Finalize () at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ClientServerCore/Core/vtkProcessModule.cxx:229
#8 0x00002b0737bb0f9e in vtkInitializationHelper::Finalize () at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ServerManager/SMApplication/vtkInitializationHelper.cxx:145
#9 0x0000000000403c50 in ParaViewPython::Run (processType=4, argc=2,
argv=0x7fff06195c88) at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvpython.h:124
#10 0x0000000000403cd5 in main (argc=2, argv=0x7fff06195c88) at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvbatch.cxx:21
rank 1
#0 0x00002b07391bde70 in __nanosleep_nocancel () from
/lib64/libpthread.so.0
#1 0x00002b073a32c898 in MPI_SGI_millisleep (milliseconds=<value
optimized out>) at sleep.c:34
#2 0x00002b073a326365 in MPI_SGI_slow_request_wait
(request=0x7fff061959f8, status=0x7fff061959d0, set=0x7fff061959f4,
gen_rc=0x7fff061959f0) at req.c:1460
#3 0x00002b073a2c6ef3 in MPI_SGI_slow_barrier (comm=1) at barrier.c:275
#4 0x00002b073a2b8bf8 in MPI_SGI_finalize () at adi.c:671
#5 0x00002b073a2e3c04 in PMPI_Finalize () at finalize.c:27
#6 0x00002b073969d96f in vtkProcessModule::Finalize () at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ClientServerCore/Core/vtkProcessModule.cxx:229
#7 0x00002b0737bb0f9e in vtkInitializationHelper::Finalize () at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ServerManager/SMApplication/vtkInitializationHelper.cxx:145
#8 0x0000000000403c50 in ParaViewPython::Run (processType=4, argc=2,
argv=0x7fff06195c88) at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvpython.h:124
#9 0x0000000000403cd5 in main (argc=2, argv=0x7fff06195c88) at
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvbatch.cxx:21
On 12/04/2012 05:15 PM, Burlen Loring wrote:
Hi Kyle,
I was wrong about MPI_Finalize being invoked twice, I had miss read
the code. I'm not sure why pvbatch is hanging in MPI_Finalize on
Nautilus. I haven't been able to find anything in the debugger. This
is new for 3.98.
Burlen
On 12/03/2012 07:36 AM, Kyle Lutz wrote:
Hi Burlen,
On Thu, Nov 29, 2012 at 1:27 PM, Burlen Loring<[email protected]> wrote:
it looks like pvserver is also impacted, hanging after the gui
disconnects.
On 11/28/2012 12:53 PM, Burlen Loring wrote:
Hi All,
some parallel tests have been failing for some time on Nautilus.
http://open.cdash.org/viewTest.php?onlyfailed&buildid=2684614
There are MPI calls made after finalize which cause deadlock issues
on SGI
MPT. It affects pvbatch for sure. The following snip-it shows the
bug, and
bug report here: http://paraview.org/Bug/view.php?id=13690
//----------------------------------------------------------------------------
bool vtkProcessModule::Finalize()
{
...
vtkProcessModule::GlobalController->Finalize(1);<-------mpi_finalize
called here
This shouldn't be calling MPI_Finalize() as the finalizedExternally
argument is 1 and in vtkMPIController::Finalize():
if (finalizedExternally == 0)
{
MPI_Finalize();
}
So my guess is that it's being invoked elsewhere.
...
#ifdef PARAVIEW_USE_MPI
if (vtkProcessModule::FinalizeMPI)
{
MPI_Barrier(MPI_COMM_WORLD);<-------------------------barrier
after
mpi_finalize
MPI_Finalize();<--------------------------------------second
mpi_finalize
}
#endif
I've made a patch which should prevent this second of code from ever
being called twice by setting the FinalizeMPI flag to false after
calling MPI_Finalize(). Can you take a look here:
http://review.source.kitware.com/#/t/1808/ and let me know if that
helps the issue.
Otherwise, would you be able to set a breakpoint on MPI_Finalize() and
get a backtrace of where it gets invoked for the second time? That
would be very helpful in tracking down the problem.
Thanks,
Kyle
_______________________________________________
Powered by www.kitware.com
Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html
Please keep messages on-topic and check the ParaView Wiki at:
http://paraview.org/Wiki/ParaView
Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview