And to clarify, if I just do serial, I get this good behavior: rcook@prism127 (~): /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver --use-offscreen-rendering --reverse-connection --client-host=localhost Waiting for client Connection URL: csrc://localhost:11111 Client connected.
On Nov 11, 2011, at 4:51 PM, Cook, Rich wrote: > My bad. > The first email I sent I was using the wrong MPI (srun instead of mpiexec -- > mvapich instead of openmpi). So both processes were indeed getting set to > the same process ID. Please ignore that output. > The current output looks like this: > > rcook@prism127 (~): mpiexec -np 2 > /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver > --use-offscreen-rendering --reverse-connection --client-host=localhost > Waiting for client > Connection URL: csrc://localhost:11111 > ERROR: In > /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, > line 481 > vtkClientSocket (0xe6a060): Socket error in call to connect. Connection > refused. > > ERROR: In > /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, > line 53 > vtkClientSocket (0xe6a060): Failed to connect to server localhost:11111 > > Warning: In > /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, > line 250 > vtkTCPNetworkAccessManager (0x8356f0): Connect failed. Retrying for 59.9993 > more seconds. > > ERROR: In > /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, > line 481 > vtkClientSocket (0xe6a060): Socket error in call to connect. Connection > refused. > > ERROR: In > /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, > line 53 > vtkClientSocket (0xe6a060): Failed to connect to server localhost:11111 > > Warning: In > /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, > line 250 > vtkTCPNetworkAccessManager (0x8356f0): Connect failed. Retrying for 58.9972 > more seconds. > > mpiexec: killing job... > > > Note the presence of only one connecting message. Again, I apologize for the > mixup. I spoke with our MPI guru and have confirmed that MPI appears to be > working correctly and I'm not making a mistake in how I launch pvserver from > the batch job perspective. > > Do you still want that output? > > On Nov 11, 2011, at 4:44 PM, Utkarsh Ayachit wrote: > >> That sounds very odd. If process_id variable is indeed correctly set >> to 0 and 1 on the two processes, then how come there are two "Waiting >> for client" lines printed out in the first email that you sent? >> >> Can you change that line cout to the following to verify that both >> processes are indeed printing out from the same time? >> >> cout << __LINE__ << " : Waiting for client" << endl; >> >> (This is in pvserver_common.h: 58) >> >> Utkarsh >> >> On Fri, Nov 11, 2011 at 6:30 PM, Cook, Rich <[email protected]> wrote: >>> I posted the CMakeCache.txt. I also have tried to step through the code >>> using TotalView and I can see it calling MPI_init() etc. It looks like one >>> process correctly gets rank 0 and one gets rank 1 (by inspecting process_id >>> variable in RealMain()) >>> If I start in serial, it connects and I can view a protein molecule >>> successfully. If I start in parallel, exactly one server tries and fails >>> to connect. Am I supposed to give any extra arguments when starting in >>> parallel? >>> This is what I'm doing: >>> >>> mpiexec -np 2 >>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >>> --use-offscreen-rendering --reverse-connection --client-host=localhost >>> >>> >>> >>> On Nov 11, 2011, at 11:11 AM, Utkarsh Ayachit wrote: >>> >>>> Can you post your CMakeCache.txt? >>>> >>>> Utkarsh >>>> >>>> On Fri, Nov 11, 2011 at 2:08 PM, Cook, Rich <[email protected]> wrote: >>>>> Hi, thanks, but you are incorrect. >>>>> I did set that variable and it was indeed compiled with MPI, as I said. >>>>> >>>>> rcook@prism127 (IMG_private): type pvserver >>>>> pvserver is >>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >>>>> rcook@prism127 (IMG_private): ldd >>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >>>>> libmpi.so.0 => /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi.so.0 >>>>> (0x00002aaaaacc9000) >>>>> libopen-rte.so.0 => >>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-rte.so.0 >>>>> (0x00002aaaaaf6c000) >>>>> libopen-pal.so.0 => >>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-pal.so.0 >>>>> (0x00002aaaab1b7000) >>>>> libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaab434000) >>>>> libnsl.so.1 => /lib64/libnsl.so.1 (0x00002aaaab638000) >>>>> libutil.so.1 => /lib64/libutil.so.1 (0x00002aaaab850000) >>>>> libm.so.6 => /lib64/libm.so.6 (0x00002aaaaba54000) >>>>> libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaabcd7000) >>>>> libc.so.6 => /lib64/libc.so.6 (0x00002aaaabef2000) >>>>> /lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000) >>>>> >>>>> When the pvservers are running, I can see that they are the correct >>>>> binaries, and ldd confirms they are MPI-capable. >>>>> >>>>> rcook@prism120 (~): ldd >>>>> /collab/usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/lib/paraview-3.12/pvserver >>>>> | grep mpi >>>>> libmpi_cxx.so.0 => >>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi_cxx.so.0 >>>>> (0x00002aaab23bf000) >>>>> libmpi.so.0 => /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi.so.0 >>>>> (0x00002aaab25da000) >>>>> libopen-rte.so.0 => >>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-rte.so.0 >>>>> (0x00002aaab287d000) >>>>> libopen-pal.so.0 => >>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-pal.so.0 >>>>> (0x00002aaab2ac7000) >>>>> >>>>> >>>>> On Nov 11, 2011, at 11:04 AM, Utkarsh Ayachit wrote: >>>>> >>>>>> Your pvserver is not built with MPI enabled. Please rebuild pvserver >>>>>> with CMake variable PARAVIEW_USE_MPI:BOOL=ON. >>>>>> >>>>>> Utkarsh >>>>>> >>>>>> On Fri, Nov 11, 2011 at 1:54 PM, Cook, Rich <[email protected]> wrote: >>>>>>> We have a tricky firewall situation here so I have to use reverse >>>>>>> tunneling per >>>>>>> http://www.paraview.org/Wiki/Reverse_connection_and_port_forwarding#Reverse_Connection_Over_an_ssh_Tunnel >>>>>>> >>>>>>> I'm not sure I'm doing it right. I can do it with a single server, but >>>>>>> when I try to run in parallel, it looks like something is broken. My >>>>>>> understanding is that when launched under MPI, the servers should talk >>>>>>> to eachother and only one of the servers should try to connect back to >>>>>>> the client. I compiled with MPI, and am running in an MPI environment, >>>>>>> but it looks as though the pvservers are not talking to each other but >>>>>>> are each trying to make their own connection to the client. Below is >>>>>>> the output. Can anyone help me get this up and running? I know I'm >>>>>>> close. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> rcook@prism127 (IMG_private): srun -n 8 >>>>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >>>>>>> --use-offscreen-rendering --reverse-connection >>>>>>> --client-host=localhost >>>>>>> Waiting for client >>>>>>> Connection URL: csrc://localhost:11111 >>>>>>> Client connected. >>>>>>> Waiting for client >>>>>>> Connection URL: csrc://localhost:11111 >>>>>>> ERROR: In >>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, >>>>>>> line 481 >>>>>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. Connection >>>>>>> refused. >>>>>>> >>>>>>> ERROR: In >>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, >>>>>>> line 53 >>>>>>> vtkClientSocket (0xd8ee20): Failed to connect to server localhost:11111 >>>>>>> >>>>>>> Warning: In >>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, >>>>>>> line 250 >>>>>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed. Retrying for >>>>>>> 59.9994 more seconds. >>>>>>> >>>>>>> ERROR: In >>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, >>>>>>> line 481 >>>>>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. Connection >>>>>>> refused. >>>>>>> >>>>>>> ERROR: In >>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, >>>>>>> line 53 >>>>>>> vtkClientSocket (0xd8ee20): Failed to connect to server localhost:11111 >>>>>>> >>>>>>> Warning: In >>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, >>>>>>> line 250 >>>>>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed. Retrying for >>>>>>> 58.9972 more seconds. >>>>>>> >>>>>>> ERROR: In >>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, >>>>>>> line 481 >>>>>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. Connection >>>>>>> refused. >>>>>>> >>>>>>> ERROR: In >>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, >>>>>>> line 53 >>>>>>> vtkClientSocket (0xd8ee20): Failed to connect to server localhost:11111 >>>>>>> >>>>>>> Warning: In >>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, >>>>>>> line 250 >>>>>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed. Retrying for >>>>>>> 57.9952 more seconds. >>>>>>> >>>>>>> >>>>>>> etc. etc. etc. >>>>>>> -- >>>>>>> ✐Richard Cook >>>>>>> ✇ Lawrence Livermore National Laboratory >>>>>>> Bldg-453 Rm-4024, Mail Stop L-557 >>>>>>> 7000 East Avenue, Livermore, CA, 94550, USA >>>>>>> ☎ (office) (925) 423-9605 >>>>>>> ☎ (fax) (925) 423-6961 >>>>>>> --- >>>>>>> Information Management & Graphics Grp., Services & Development Div., >>>>>>> Integrated Computing & Communications Dept. >>>>>>> (opinions expressed herein are mine and not those of LLNL) >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Powered by www.kitware.com >>>>>>> >>>>>>> Visit other Kitware open-source projects at >>>>>>> http://www.kitware.com/opensource/opensource.html >>>>>>> >>>>>>> Please keep messages on-topic and check the ParaView Wiki at: >>>>>>> http://paraview.org/Wiki/ParaView >>>>>>> >>>>>>> Follow this link to subscribe/unsubscribe: >>>>>>> http://www.paraview.org/mailman/listinfo/paraview >>>>>>> >>>>> >>>>> -- >>>>> ✐Richard Cook >>>>> ✇ Lawrence Livermore National Laboratory >>>>> Bldg-453 Rm-4024, Mail Stop L-557 >>>>> 7000 East Avenue, Livermore, CA, 94550, USA >>>>> ☎ (office) (925) 423-9605 >>>>> ☎ (fax) (925) 423-6961 >>>>> --- >>>>> Information Management & Graphics Grp., Services & Development Div., >>>>> Integrated Computing & Communications Dept. >>>>> (opinions expressed herein are mine and not those of LLNL) >>>>> >>>>> >>>>> >>>>> >>> >>> -- >>> ✐Richard Cook >>> ✇ Lawrence Livermore National Laboratory >>> Bldg-453 Rm-4024, Mail Stop L-557 >>> 7000 East Avenue, Livermore, CA, 94550, USA >>> ☎ (office) (925) 423-9605 >>> ☎ (fax) (925) 423-6961 >>> --- >>> Information Management & Graphics Grp., Services & Development Div., >>> Integrated Computing & Communications Dept. >>> (opinions expressed herein are mine and not those of LLNL) >>> >>> >>> >>> > > -- > ✐Richard Cook > ✇ Lawrence Livermore National Laboratory > Bldg-453 Rm-4024, Mail Stop L-557 > 7000 East Avenue, Livermore, CA, 94550, USA > ☎ (office) (925) 423-9605 > ☎ (fax) (925) 423-6961 > --- > Information Management & Graphics Grp., Services & Development Div., > Integrated Computing & Communications Dept. > (opinions expressed herein are mine and not those of LLNL) > > > -- ✐Richard Cook ✇ Lawrence Livermore National Laboratory Bldg-453 Rm-4024, Mail Stop L-557 7000 East Avenue, Livermore, CA, 94550, USA ☎ (office) (925) 423-9605 ☎ (fax) (925) 423-6961 --- Information Management & Graphics Grp., Services & Development Div., Integrated Computing & Communications Dept. (opinions expressed herein are mine and not those of LLNL) _______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView Follow this link to subscribe/unsubscribe: http://www.paraview.org/mailman/listinfo/paraview
