My bad. The first email I sent I was using the wrong MPI (srun instead of mpiexec -- mvapich instead of openmpi). So both processes were indeed getting set to the same process ID. Please ignore that output. The current output looks like this:
rcook@prism127 (~): mpiexec -np 2 /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver --use-offscreen-rendering --reverse-connection --client-host=localhost Waiting for client Connection URL: csrc://localhost:11111 ERROR: In /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, line 481 vtkClientSocket (0xe6a060): Socket error in call to connect. Connection refused. ERROR: In /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, line 53 vtkClientSocket (0xe6a060): Failed to connect to server localhost:11111 Warning: In /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, line 250 vtkTCPNetworkAccessManager (0x8356f0): Connect failed. Retrying for 59.9993 more seconds. ERROR: In /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, line 481 vtkClientSocket (0xe6a060): Socket error in call to connect. Connection refused. ERROR: In /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, line 53 vtkClientSocket (0xe6a060): Failed to connect to server localhost:11111 Warning: In /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, line 250 vtkTCPNetworkAccessManager (0x8356f0): Connect failed. Retrying for 58.9972 more seconds. mpiexec: killing job... Note the presence of only one connecting message. Again, I apologize for the mixup. I spoke with our MPI guru and have confirmed that MPI appears to be working correctly and I'm not making a mistake in how I launch pvserver from the batch job perspective. Do you still want that output? On Nov 11, 2011, at 4:44 PM, Utkarsh Ayachit wrote: > That sounds very odd. If process_id variable is indeed correctly set > to 0 and 1 on the two processes, then how come there are two "Waiting > for client" lines printed out in the first email that you sent? > > Can you change that line cout to the following to verify that both > processes are indeed printing out from the same time? > > cout << __LINE__ << " : Waiting for client" << endl; > > (This is in pvserver_common.h: 58) > > Utkarsh > > On Fri, Nov 11, 2011 at 6:30 PM, Cook, Rich <[email protected]> wrote: >> I posted the CMakeCache.txt. I also have tried to step through the code >> using TotalView and I can see it calling MPI_init() etc. It looks like one >> process correctly gets rank 0 and one gets rank 1 (by inspecting process_id >> variable in RealMain()) >> If I start in serial, it connects and I can view a protein molecule >> successfully. If I start in parallel, exactly one server tries and fails to >> connect. Am I supposed to give any extra arguments when starting in >> parallel? >> This is what I'm doing: >> >> mpiexec -np 2 >> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >> --use-offscreen-rendering --reverse-connection --client-host=localhost >> >> >> >> On Nov 11, 2011, at 11:11 AM, Utkarsh Ayachit wrote: >> >>> Can you post your CMakeCache.txt? >>> >>> Utkarsh >>> >>> On Fri, Nov 11, 2011 at 2:08 PM, Cook, Rich <[email protected]> wrote: >>>> Hi, thanks, but you are incorrect. >>>> I did set that variable and it was indeed compiled with MPI, as I said. >>>> >>>> rcook@prism127 (IMG_private): type pvserver >>>> pvserver is >>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >>>> rcook@prism127 (IMG_private): ldd >>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >>>> libmpi.so.0 => /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi.so.0 >>>> (0x00002aaaaacc9000) >>>> libopen-rte.so.0 => >>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-rte.so.0 >>>> (0x00002aaaaaf6c000) >>>> libopen-pal.so.0 => >>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-pal.so.0 >>>> (0x00002aaaab1b7000) >>>> libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaab434000) >>>> libnsl.so.1 => /lib64/libnsl.so.1 (0x00002aaaab638000) >>>> libutil.so.1 => /lib64/libutil.so.1 (0x00002aaaab850000) >>>> libm.so.6 => /lib64/libm.so.6 (0x00002aaaaba54000) >>>> libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaabcd7000) >>>> libc.so.6 => /lib64/libc.so.6 (0x00002aaaabef2000) >>>> /lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000) >>>> >>>> When the pvservers are running, I can see that they are the correct >>>> binaries, and ldd confirms they are MPI-capable. >>>> >>>> rcook@prism120 (~): ldd >>>> /collab/usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/lib/paraview-3.12/pvserver >>>> | grep mpi >>>> libmpi_cxx.so.0 => >>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi_cxx.so.0 (0x00002aaab23bf000) >>>> libmpi.so.0 => /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi.so.0 >>>> (0x00002aaab25da000) >>>> libopen-rte.so.0 => >>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-rte.so.0 >>>> (0x00002aaab287d000) >>>> libopen-pal.so.0 => >>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-pal.so.0 >>>> (0x00002aaab2ac7000) >>>> >>>> >>>> On Nov 11, 2011, at 11:04 AM, Utkarsh Ayachit wrote: >>>> >>>>> Your pvserver is not built with MPI enabled. Please rebuild pvserver >>>>> with CMake variable PARAVIEW_USE_MPI:BOOL=ON. >>>>> >>>>> Utkarsh >>>>> >>>>> On Fri, Nov 11, 2011 at 1:54 PM, Cook, Rich <[email protected]> wrote: >>>>>> We have a tricky firewall situation here so I have to use reverse >>>>>> tunneling per >>>>>> http://www.paraview.org/Wiki/Reverse_connection_and_port_forwarding#Reverse_Connection_Over_an_ssh_Tunnel >>>>>> >>>>>> I'm not sure I'm doing it right. I can do it with a single server, but >>>>>> when I try to run in parallel, it looks like something is broken. My >>>>>> understanding is that when launched under MPI, the servers should talk >>>>>> to eachother and only one of the servers should try to connect back to >>>>>> the client. I compiled with MPI, and am running in an MPI environment, >>>>>> but it looks as though the pvservers are not talking to each other but >>>>>> are each trying to make their own connection to the client. Below is >>>>>> the output. Can anyone help me get this up and running? I know I'm >>>>>> close. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> rcook@prism127 (IMG_private): srun -n 8 >>>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >>>>>> --use-offscreen-rendering --reverse-connection --client-host=localhost >>>>>> Waiting for client >>>>>> Connection URL: csrc://localhost:11111 >>>>>> Client connected. >>>>>> Waiting for client >>>>>> Connection URL: csrc://localhost:11111 >>>>>> ERROR: In >>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, >>>>>> line 481 >>>>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. Connection >>>>>> refused. >>>>>> >>>>>> ERROR: In >>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, >>>>>> line 53 >>>>>> vtkClientSocket (0xd8ee20): Failed to connect to server localhost:11111 >>>>>> >>>>>> Warning: In >>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, >>>>>> line 250 >>>>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed. Retrying for >>>>>> 59.9994 more seconds. >>>>>> >>>>>> ERROR: In >>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, >>>>>> line 481 >>>>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. Connection >>>>>> refused. >>>>>> >>>>>> ERROR: In >>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, >>>>>> line 53 >>>>>> vtkClientSocket (0xd8ee20): Failed to connect to server localhost:11111 >>>>>> >>>>>> Warning: In >>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, >>>>>> line 250 >>>>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed. Retrying for >>>>>> 58.9972 more seconds. >>>>>> >>>>>> ERROR: In >>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, >>>>>> line 481 >>>>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. Connection >>>>>> refused. >>>>>> >>>>>> ERROR: In >>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, >>>>>> line 53 >>>>>> vtkClientSocket (0xd8ee20): Failed to connect to server localhost:11111 >>>>>> >>>>>> Warning: In >>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, >>>>>> line 250 >>>>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed. Retrying for >>>>>> 57.9952 more seconds. >>>>>> >>>>>> >>>>>> etc. etc. etc. >>>>>> -- >>>>>> ✐Richard Cook >>>>>> ✇ Lawrence Livermore National Laboratory >>>>>> Bldg-453 Rm-4024, Mail Stop L-557 >>>>>> 7000 East Avenue, Livermore, CA, 94550, USA >>>>>> ☎ (office) (925) 423-9605 >>>>>> ☎ (fax) (925) 423-6961 >>>>>> --- >>>>>> Information Management & Graphics Grp., Services & Development Div., >>>>>> Integrated Computing & Communications Dept. >>>>>> (opinions expressed herein are mine and not those of LLNL) >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Powered by www.kitware.com >>>>>> >>>>>> Visit other Kitware open-source projects at >>>>>> http://www.kitware.com/opensource/opensource.html >>>>>> >>>>>> Please keep messages on-topic and check the ParaView Wiki at: >>>>>> http://paraview.org/Wiki/ParaView >>>>>> >>>>>> Follow this link to subscribe/unsubscribe: >>>>>> http://www.paraview.org/mailman/listinfo/paraview >>>>>> >>>> >>>> -- >>>> ✐Richard Cook >>>> ✇ Lawrence Livermore National Laboratory >>>> Bldg-453 Rm-4024, Mail Stop L-557 >>>> 7000 East Avenue, Livermore, CA, 94550, USA >>>> ☎ (office) (925) 423-9605 >>>> ☎ (fax) (925) 423-6961 >>>> --- >>>> Information Management & Graphics Grp., Services & Development Div., >>>> Integrated Computing & Communications Dept. >>>> (opinions expressed herein are mine and not those of LLNL) >>>> >>>> >>>> >>>> >> >> -- >> ✐Richard Cook >> ✇ Lawrence Livermore National Laboratory >> Bldg-453 Rm-4024, Mail Stop L-557 >> 7000 East Avenue, Livermore, CA, 94550, USA >> ☎ (office) (925) 423-9605 >> ☎ (fax) (925) 423-6961 >> --- >> Information Management & Graphics Grp., Services & Development Div., >> Integrated Computing & Communications Dept. >> (opinions expressed herein are mine and not those of LLNL) >> >> >> >> -- ✐Richard Cook ✇ Lawrence Livermore National Laboratory Bldg-453 Rm-4024, Mail Stop L-557 7000 East Avenue, Livermore, CA, 94550, USA ☎ (office) (925) 423-9605 ☎ (fax) (925) 423-6961 --- Information Management & Graphics Grp., Services & Development Div., Integrated Computing & Communications Dept. (opinions expressed herein are mine and not those of LLNL) _______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView Follow this link to subscribe/unsubscribe: http://www.paraview.org/mailman/listinfo/paraview
