I posted the CMakeCache.txt. I also have tried to step through the code using TotalView and I can see it calling MPI_init() etc. It looks like one process correctly gets rank 0 and one gets rank 1 (by inspecting process_id variable in RealMain()) If I start in serial, it connects and I can view a protein molecule successfully. If I start in parallel, exactly one server tries and fails to connect. Am I supposed to give any extra arguments when starting in parallel? This is what I'm doing:
mpiexec -np 2 /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver --use-offscreen-rendering --reverse-connection --client-host=localhost On Nov 11, 2011, at 11:11 AM, Utkarsh Ayachit wrote: > Can you post your CMakeCache.txt? > > Utkarsh > > On Fri, Nov 11, 2011 at 2:08 PM, Cook, Rich <[email protected]> wrote: >> Hi, thanks, but you are incorrect. >> I did set that variable and it was indeed compiled with MPI, as I said. >> >> rcook@prism127 (IMG_private): type pvserver >> pvserver is >> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >> rcook@prism127 (IMG_private): ldd >> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >> libmpi.so.0 => /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi.so.0 >> (0x00002aaaaacc9000) >> libopen-rte.so.0 => >> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-rte.so.0 (0x00002aaaaaf6c000) >> libopen-pal.so.0 => >> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-pal.so.0 (0x00002aaaab1b7000) >> libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaab434000) >> libnsl.so.1 => /lib64/libnsl.so.1 (0x00002aaaab638000) >> libutil.so.1 => /lib64/libutil.so.1 (0x00002aaaab850000) >> libm.so.6 => /lib64/libm.so.6 (0x00002aaaaba54000) >> libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaabcd7000) >> libc.so.6 => /lib64/libc.so.6 (0x00002aaaabef2000) >> /lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000) >> >> When the pvservers are running, I can see that they are the correct >> binaries, and ldd confirms they are MPI-capable. >> >> rcook@prism120 (~): ldd >> /collab/usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/lib/paraview-3.12/pvserver >> | grep mpi >> libmpi_cxx.so.0 => >> /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi_cxx.so.0 (0x00002aaab23bf000) >> libmpi.so.0 => /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi.so.0 >> (0x00002aaab25da000) >> libopen-rte.so.0 => >> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-rte.so.0 (0x00002aaab287d000) >> libopen-pal.so.0 => >> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-pal.so.0 (0x00002aaab2ac7000) >> >> >> On Nov 11, 2011, at 11:04 AM, Utkarsh Ayachit wrote: >> >>> Your pvserver is not built with MPI enabled. Please rebuild pvserver >>> with CMake variable PARAVIEW_USE_MPI:BOOL=ON. >>> >>> Utkarsh >>> >>> On Fri, Nov 11, 2011 at 1:54 PM, Cook, Rich <[email protected]> wrote: >>>> We have a tricky firewall situation here so I have to use reverse >>>> tunneling per >>>> http://www.paraview.org/Wiki/Reverse_connection_and_port_forwarding#Reverse_Connection_Over_an_ssh_Tunnel >>>> >>>> I'm not sure I'm doing it right. I can do it with a single server, but >>>> when I try to run in parallel, it looks like something is broken. My >>>> understanding is that when launched under MPI, the servers should talk to >>>> eachother and only one of the servers should try to connect back to the >>>> client. I compiled with MPI, and am running in an MPI environment, but it >>>> looks as though the pvservers are not talking to each other but are each >>>> trying to make their own connection to the client. Below is the output. >>>> Can anyone help me get this up and running? I know I'm close. >>>> >>>> Thanks! >>>> >>>> rcook@prism127 (IMG_private): srun -n 8 >>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >>>> --use-offscreen-rendering --reverse-connection --client-host=localhost >>>> Waiting for client >>>> Connection URL: csrc://localhost:11111 >>>> Client connected. >>>> Waiting for client >>>> Connection URL: csrc://localhost:11111 >>>> ERROR: In >>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, >>>> line 481 >>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. Connection >>>> refused. >>>> >>>> ERROR: In >>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, >>>> line 53 >>>> vtkClientSocket (0xd8ee20): Failed to connect to server localhost:11111 >>>> >>>> Warning: In >>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, >>>> line 250 >>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed. Retrying for >>>> 59.9994 more seconds. >>>> >>>> ERROR: In >>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, >>>> line 481 >>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. Connection >>>> refused. >>>> >>>> ERROR: In >>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, >>>> line 53 >>>> vtkClientSocket (0xd8ee20): Failed to connect to server localhost:11111 >>>> >>>> Warning: In >>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, >>>> line 250 >>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed. Retrying for >>>> 58.9972 more seconds. >>>> >>>> ERROR: In >>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, >>>> line 481 >>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. Connection >>>> refused. >>>> >>>> ERROR: In >>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, >>>> line 53 >>>> vtkClientSocket (0xd8ee20): Failed to connect to server localhost:11111 >>>> >>>> Warning: In >>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, >>>> line 250 >>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed. Retrying for >>>> 57.9952 more seconds. >>>> >>>> >>>> etc. etc. etc. >>>> -- >>>> ✐Richard Cook >>>> ✇ Lawrence Livermore National Laboratory >>>> Bldg-453 Rm-4024, Mail Stop L-557 >>>> 7000 East Avenue, Livermore, CA, 94550, USA >>>> ☎ (office) (925) 423-9605 >>>> ☎ (fax) (925) 423-6961 >>>> --- >>>> Information Management & Graphics Grp., Services & Development Div., >>>> Integrated Computing & Communications Dept. >>>> (opinions expressed herein are mine and not those of LLNL) >>>> >>>> >>>> >>>> _______________________________________________ >>>> Powered by www.kitware.com >>>> >>>> Visit other Kitware open-source projects at >>>> http://www.kitware.com/opensource/opensource.html >>>> >>>> Please keep messages on-topic and check the ParaView Wiki at: >>>> http://paraview.org/Wiki/ParaView >>>> >>>> Follow this link to subscribe/unsubscribe: >>>> http://www.paraview.org/mailman/listinfo/paraview >>>> >> >> -- >> ✐Richard Cook >> ✇ Lawrence Livermore National Laboratory >> Bldg-453 Rm-4024, Mail Stop L-557 >> 7000 East Avenue, Livermore, CA, 94550, USA >> ☎ (office) (925) 423-9605 >> ☎ (fax) (925) 423-6961 >> --- >> Information Management & Graphics Grp., Services & Development Div., >> Integrated Computing & Communications Dept. >> (opinions expressed herein are mine and not those of LLNL) >> >> >> >> -- ✐Richard Cook ✇ Lawrence Livermore National Laboratory Bldg-453 Rm-4024, Mail Stop L-557 7000 East Avenue, Livermore, CA, 94550, USA ☎ (office) (925) 423-9605 ☎ (fax) (925) 423-6961 --- Information Management & Graphics Grp., Services & Development Div., Integrated Computing & Communications Dept. (opinions expressed herein are mine and not those of LLNL) _______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView Follow this link to subscribe/unsubscribe: http://www.paraview.org/mailman/listinfo/paraview
