Great, thanks! It be workin'! Thanks for the help. -- Rich On Nov 11, 2011, at 5:16 PM, Utkarsh Ayachit wrote:
> You can always apply "Process ID Scalars" filter to a Sphere source > and that will show colors for each process. > > Utkarsh > > On Fri, Nov 11, 2011 at 8:10 PM, Cook, Rich <[email protected]> wrote: >> Good catch. Indeed, the root node was on prism120, another node in the >> batch pool. When I tunneled to that host instead of the other, I got a good >> connection with 2 servers using MPI. >> Just to be sure, is there a way to query the state of the connection from >> within the client? I cannot tell from the GUI or the server output whether >> I am connected to 2 servers or 1. I am certain I launched two servers and >> got a good connection, and I can view a molecule, but... I'm paranoid. You >> never know. :-) >> >> This is going to be nasty to try to make work for our users. >> >> Thanks for the help! >> -- Rich >> >> On Nov 11, 2011, at 4:57 PM, Utkarsh Ayachit wrote: >> >>> Very peculiar. I wonder if MPI is running the root node on some other >>> node. Are you sure the process is run on the same machine? Can you >>> trying putting an IP address or real hostname instead of localhost? >>> >>> Utkarsh >>> >>> On Fri, Nov 11, 2011 at 7:54 PM, Cook, Rich <[email protected]> wrote: >>>> And to clarify, if I just do serial, I get this good behavior: >>>> >>>> rcook@prism127 (~): >>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >>>> --use-offscreen-rendering --reverse-connection --client-host=localhost >>>> Waiting for client >>>> Connection URL: csrc://localhost:11111 >>>> Client connected. >>>> >>>> On Nov 11, 2011, at 4:51 PM, Cook, Rich wrote: >>>> >>>>> My bad. >>>>> The first email I sent I was using the wrong MPI (srun instead of mpiexec >>>>> -- mvapich instead of openmpi). So both processes were indeed getting >>>>> set to the same process ID. Please ignore that output. >>>>> The current output looks like this: >>>>> >>>>> rcook@prism127 (~): mpiexec -np 2 >>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >>>>> --use-offscreen-rendering --reverse-connection --client-host=localhost >>>>> Waiting for client >>>>> Connection URL: csrc://localhost:11111 >>>>> ERROR: In >>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, >>>>> line 481 >>>>> vtkClientSocket (0xe6a060): Socket error in call to connect. Connection >>>>> refused. >>>>> >>>>> ERROR: In >>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, >>>>> line 53 >>>>> vtkClientSocket (0xe6a060): Failed to connect to server localhost:11111 >>>>> >>>>> Warning: In >>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, >>>>> line 250 >>>>> vtkTCPNetworkAccessManager (0x8356f0): Connect failed. Retrying for >>>>> 59.9993 more seconds. >>>>> >>>>> ERROR: In >>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, >>>>> line 481 >>>>> vtkClientSocket (0xe6a060): Socket error in call to connect. Connection >>>>> refused. >>>>> >>>>> ERROR: In >>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, >>>>> line 53 >>>>> vtkClientSocket (0xe6a060): Failed to connect to server localhost:11111 >>>>> >>>>> Warning: In >>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, >>>>> line 250 >>>>> vtkTCPNetworkAccessManager (0x8356f0): Connect failed. Retrying for >>>>> 58.9972 more seconds. >>>>> >>>>> mpiexec: killing job... >>>>> >>>>> >>>>> Note the presence of only one connecting message. Again, I apologize for >>>>> the mixup. I spoke with our MPI guru and have confirmed that MPI appears >>>>> to be working correctly and I'm not making a mistake in how I launch >>>>> pvserver from the batch job perspective. >>>>> >>>>> Do you still want that output? >>>>> >>>>> On Nov 11, 2011, at 4:44 PM, Utkarsh Ayachit wrote: >>>>> >>>>>> That sounds very odd. If process_id variable is indeed correctly set >>>>>> to 0 and 1 on the two processes, then how come there are two "Waiting >>>>>> for client" lines printed out in the first email that you sent? >>>>>> >>>>>> Can you change that line cout to the following to verify that both >>>>>> processes are indeed printing out from the same time? >>>>>> >>>>>> cout << __LINE__ << " : Waiting for client" << endl; >>>>>> >>>>>> (This is in pvserver_common.h: 58) >>>>>> >>>>>> Utkarsh >>>>>> >>>>>> On Fri, Nov 11, 2011 at 6:30 PM, Cook, Rich <[email protected]> wrote: >>>>>>> I posted the CMakeCache.txt. I also have tried to step through the >>>>>>> code using TotalView and I can see it calling MPI_init() etc. It looks >>>>>>> like one process correctly gets rank 0 and one gets rank 1 (by >>>>>>> inspecting process_id variable in RealMain()) >>>>>>> If I start in serial, it connects and I can view a protein molecule >>>>>>> successfully. If I start in parallel, exactly one server tries and >>>>>>> fails to connect. Am I supposed to give any extra arguments when >>>>>>> starting in parallel? >>>>>>> This is what I'm doing: >>>>>>> >>>>>>> mpiexec -np 2 >>>>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >>>>>>> --use-offscreen-rendering --reverse-connection >>>>>>> --client-host=localhost >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Nov 11, 2011, at 11:11 AM, Utkarsh Ayachit wrote: >>>>>>> >>>>>>>> Can you post your CMakeCache.txt? >>>>>>>> >>>>>>>> Utkarsh >>>>>>>> >>>>>>>> On Fri, Nov 11, 2011 at 2:08 PM, Cook, Rich <[email protected]> wrote: >>>>>>>>> Hi, thanks, but you are incorrect. >>>>>>>>> I did set that variable and it was indeed compiled with MPI, as I >>>>>>>>> said. >>>>>>>>> >>>>>>>>> rcook@prism127 (IMG_private): type pvserver >>>>>>>>> pvserver is >>>>>>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >>>>>>>>> rcook@prism127 (IMG_private): ldd >>>>>>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >>>>>>>>> libmpi.so.0 => /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi.so.0 >>>>>>>>> (0x00002aaaaacc9000) >>>>>>>>> libopen-rte.so.0 => >>>>>>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-rte.so.0 >>>>>>>>> (0x00002aaaaaf6c000) >>>>>>>>> libopen-pal.so.0 => >>>>>>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-pal.so.0 >>>>>>>>> (0x00002aaaab1b7000) >>>>>>>>> libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaab434000) >>>>>>>>> libnsl.so.1 => /lib64/libnsl.so.1 (0x00002aaaab638000) >>>>>>>>> libutil.so.1 => /lib64/libutil.so.1 (0x00002aaaab850000) >>>>>>>>> libm.so.6 => /lib64/libm.so.6 (0x00002aaaaba54000) >>>>>>>>> libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaabcd7000) >>>>>>>>> libc.so.6 => /lib64/libc.so.6 (0x00002aaaabef2000) >>>>>>>>> /lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000) >>>>>>>>> >>>>>>>>> When the pvservers are running, I can see that they are the correct >>>>>>>>> binaries, and ldd confirms they are MPI-capable. >>>>>>>>> >>>>>>>>> rcook@prism120 (~): ldd >>>>>>>>> /collab/usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/lib/paraview-3.12/pvserver >>>>>>>>> | grep mpi >>>>>>>>> libmpi_cxx.so.0 => >>>>>>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi_cxx.so.0 >>>>>>>>> (0x00002aaab23bf000) >>>>>>>>> libmpi.so.0 => /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi.so.0 >>>>>>>>> (0x00002aaab25da000) >>>>>>>>> libopen-rte.so.0 => >>>>>>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-rte.so.0 >>>>>>>>> (0x00002aaab287d000) >>>>>>>>> libopen-pal.so.0 => >>>>>>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-pal.so.0 >>>>>>>>> (0x00002aaab2ac7000) >>>>>>>>> >>>>>>>>> >>>>>>>>> On Nov 11, 2011, at 11:04 AM, Utkarsh Ayachit wrote: >>>>>>>>> >>>>>>>>>> Your pvserver is not built with MPI enabled. Please rebuild pvserver >>>>>>>>>> with CMake variable PARAVIEW_USE_MPI:BOOL=ON. >>>>>>>>>> >>>>>>>>>> Utkarsh >>>>>>>>>> >>>>>>>>>> On Fri, Nov 11, 2011 at 1:54 PM, Cook, Rich <[email protected]> wrote: >>>>>>>>>>> We have a tricky firewall situation here so I have to use reverse >>>>>>>>>>> tunneling per >>>>>>>>>>> http://www.paraview.org/Wiki/Reverse_connection_and_port_forwarding#Reverse_Connection_Over_an_ssh_Tunnel >>>>>>>>>>> >>>>>>>>>>> I'm not sure I'm doing it right. I can do it with a single server, >>>>>>>>>>> but when I try to run in parallel, it looks like something is >>>>>>>>>>> broken. My understanding is that when launched under MPI, the >>>>>>>>>>> servers should talk to eachother and only one of the servers should >>>>>>>>>>> try to connect back to the client. I compiled with MPI, and am >>>>>>>>>>> running in an MPI environment, but it looks as though the pvservers >>>>>>>>>>> are not talking to each other but are each trying to make their own >>>>>>>>>>> connection to the client. Below is the output. Can anyone help me >>>>>>>>>>> get this up and running? I know I'm close. >>>>>>>>>>> >>>>>>>>>>> Thanks! >>>>>>>>>>> >>>>>>>>>>> rcook@prism127 (IMG_private): srun -n 8 >>>>>>>>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver >>>>>>>>>>> --use-offscreen-rendering --reverse-connection >>>>>>>>>>> --client-host=localhost >>>>>>>>>>> Waiting for client >>>>>>>>>>> Connection URL: csrc://localhost:11111 >>>>>>>>>>> Client connected. >>>>>>>>>>> Waiting for client >>>>>>>>>>> Connection URL: csrc://localhost:11111 >>>>>>>>>>> ERROR: In >>>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, >>>>>>>>>>> line 481 >>>>>>>>>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. >>>>>>>>>>> Connection refused. >>>>>>>>>>> >>>>>>>>>>> ERROR: In >>>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, >>>>>>>>>>> line 53 >>>>>>>>>>> vtkClientSocket (0xd8ee20): Failed to connect to server >>>>>>>>>>> localhost:11111 >>>>>>>>>>> >>>>>>>>>>> Warning: In >>>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, >>>>>>>>>>> line 250 >>>>>>>>>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed. Retrying >>>>>>>>>>> for 59.9994 more seconds. >>>>>>>>>>> >>>>>>>>>>> ERROR: In >>>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, >>>>>>>>>>> line 481 >>>>>>>>>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. >>>>>>>>>>> Connection refused. >>>>>>>>>>> >>>>>>>>>>> ERROR: In >>>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, >>>>>>>>>>> line 53 >>>>>>>>>>> vtkClientSocket (0xd8ee20): Failed to connect to server >>>>>>>>>>> localhost:11111 >>>>>>>>>>> >>>>>>>>>>> Warning: In >>>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, >>>>>>>>>>> line 250 >>>>>>>>>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed. Retrying >>>>>>>>>>> for 58.9972 more seconds. >>>>>>>>>>> >>>>>>>>>>> ERROR: In >>>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, >>>>>>>>>>> line 481 >>>>>>>>>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. >>>>>>>>>>> Connection refused. >>>>>>>>>>> >>>>>>>>>>> ERROR: In >>>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, >>>>>>>>>>> line 53 >>>>>>>>>>> vtkClientSocket (0xd8ee20): Failed to connect to server >>>>>>>>>>> localhost:11111 >>>>>>>>>>> >>>>>>>>>>> Warning: In >>>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx, >>>>>>>>>>> line 250 >>>>>>>>>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed. Retrying >>>>>>>>>>> for 57.9952 more seconds. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> etc. etc. etc. >>>>>>>>>>> -- >>>>>>>>>>> ✐Richard Cook >>>>>>>>>>> ✇ Lawrence Livermore National Laboratory >>>>>>>>>>> Bldg-453 Rm-4024, Mail Stop L-557 >>>>>>>>>>> 7000 East Avenue, Livermore, CA, 94550, USA >>>>>>>>>>> ☎ (office) (925) 423-9605 >>>>>>>>>>> ☎ (fax) (925) 423-6961 >>>>>>>>>>> --- >>>>>>>>>>> Information Management & Graphics Grp., Services & Development >>>>>>>>>>> Div., Integrated Computing & Communications Dept. >>>>>>>>>>> (opinions expressed herein are mine and not those of LLNL) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Powered by www.kitware.com >>>>>>>>>>> >>>>>>>>>>> Visit other Kitware open-source projects at >>>>>>>>>>> http://www.kitware.com/opensource/opensource.html >>>>>>>>>>> >>>>>>>>>>> Please keep messages on-topic and check the ParaView Wiki at: >>>>>>>>>>> http://paraview.org/Wiki/ParaView >>>>>>>>>>> >>>>>>>>>>> Follow this link to subscribe/unsubscribe: >>>>>>>>>>> http://www.paraview.org/mailman/listinfo/paraview >>>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ✐Richard Cook >>>>>>>>> ✇ Lawrence Livermore National Laboratory >>>>>>>>> Bldg-453 Rm-4024, Mail Stop L-557 >>>>>>>>> 7000 East Avenue, Livermore, CA, 94550, USA >>>>>>>>> ☎ (office) (925) 423-9605 >>>>>>>>> ☎ (fax) (925) 423-6961 >>>>>>>>> --- >>>>>>>>> Information Management & Graphics Grp., Services & Development Div., >>>>>>>>> Integrated Computing & Communications Dept. >>>>>>>>> (opinions expressed herein are mine and not those of LLNL) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ✐Richard Cook >>>>>>> ✇ Lawrence Livermore National Laboratory >>>>>>> Bldg-453 Rm-4024, Mail Stop L-557 >>>>>>> 7000 East Avenue, Livermore, CA, 94550, USA >>>>>>> ☎ (office) (925) 423-9605 >>>>>>> ☎ (fax) (925) 423-6961 >>>>>>> --- >>>>>>> Information Management & Graphics Grp., Services & Development Div., >>>>>>> Integrated Computing & Communications Dept. >>>>>>> (opinions expressed herein are mine and not those of LLNL) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> ✐Richard Cook >>>>> ✇ Lawrence Livermore National Laboratory >>>>> Bldg-453 Rm-4024, Mail Stop L-557 >>>>> 7000 East Avenue, Livermore, CA, 94550, USA >>>>> ☎ (office) (925) 423-9605 >>>>> ☎ (fax) (925) 423-6961 >>>>> --- >>>>> Information Management & Graphics Grp., Services & Development Div., >>>>> Integrated Computing & Communications Dept. >>>>> (opinions expressed herein are mine and not those of LLNL) >>>>> >>>>> >>>>> >>>> >>>> -- >>>> ✐Richard Cook >>>> ✇ Lawrence Livermore National Laboratory >>>> Bldg-453 Rm-4024, Mail Stop L-557 >>>> 7000 East Avenue, Livermore, CA, 94550, USA >>>> ☎ (office) (925) 423-9605 >>>> ☎ (fax) (925) 423-6961 >>>> --- >>>> Information Management & Graphics Grp., Services & Development Div., >>>> Integrated Computing & Communications Dept. >>>> (opinions expressed herein are mine and not those of LLNL) >>>> >>>> >>>> >>>> >> >> -- >> ✐Richard Cook >> ✇ Lawrence Livermore National Laboratory >> Bldg-453 Rm-4024, Mail Stop L-557 >> 7000 East Avenue, Livermore, CA, 94550, USA >> ☎ (office) (925) 423-9605 >> ☎ (fax) (925) 423-6961 >> --- >> Information Management & Graphics Grp., Services & Development Div., >> Integrated Computing & Communications Dept. >> (opinions expressed herein are mine and not those of LLNL) >> >> >> >> -- ✐Richard Cook ✇ Lawrence Livermore National Laboratory Bldg-453 Rm-4024, Mail Stop L-557 7000 East Avenue, Livermore, CA, 94550, USA ☎ (office) (925) 423-9605 ☎ (fax) (925) 423-6961 --- Information Management & Graphics Grp., Services & Development Div., Integrated Computing & Communications Dept. (opinions expressed herein are mine and not those of LLNL) _______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView Follow this link to subscribe/unsubscribe: http://www.paraview.org/mailman/listinfo/paraview
