Good catch.  Indeed, the root node was on prism120, another node in the batch 
pool.  When I tunneled to that host instead of the other, I got a good 
connection with 2 servers using MPI.
Just to be sure, is there a way to query the state of the connection from 
within the client?  I cannot tell from the GUI or the server output whether I 
am connected to 2 servers or 1.  I am certain I launched two servers and got a 
good connection, and I can view a molecule, but... I'm paranoid.  You never 
know.  :-)

This is going to be nasty to try to make work for our users.

Thanks for the help!
-- Rich

On Nov 11, 2011, at 4:57 PM, Utkarsh Ayachit wrote:

> Very peculiar. I wonder if MPI is running the root node on some other
> node. Are you sure the process is run on the same machine? Can you
> trying putting an IP address or real hostname instead of localhost?
>
> Utkarsh
>
> On Fri, Nov 11, 2011 at 7:54 PM, Cook, Rich <[email protected]> wrote:
>> And to clarify, if I just do serial, I get this good behavior:
>>
>> rcook@prism127 (~): 
>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver
>>  --use-offscreen-rendering  --reverse-connection  --client-host=localhost
>> Waiting for client
>> Connection URL: csrc://localhost:11111
>> Client connected.
>>
>> On Nov 11, 2011, at 4:51 PM, Cook, Rich wrote:
>>
>>> My bad.
>>> The first email I sent I was using the wrong MPI (srun instead of mpiexec 
>>> -- mvapich instead of openmpi).  So both processes were indeed getting set 
>>> to the same process ID.  Please ignore that output.
>>> The current output looks like this:
>>>
>>> rcook@prism127 (~): mpiexec -np 2 
>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver
>>>  --use-offscreen-rendering  --reverse-connection  --client-host=localhost
>>> Waiting for client
>>> Connection URL: csrc://localhost:11111
>>> ERROR: In 
>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, 
>>> line 481
>>> vtkClientSocket (0xe6a060): Socket error in call to connect. Connection 
>>> refused.
>>>
>>> ERROR: In 
>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx,
>>>  line 53
>>> vtkClientSocket (0xe6a060): Failed to connect to server localhost:11111
>>>
>>> Warning: In 
>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx,
>>>  line 250
>>> vtkTCPNetworkAccessManager (0x8356f0): Connect failed.  Retrying for 
>>> 59.9993 more seconds.
>>>
>>> ERROR: In 
>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, 
>>> line 481
>>> vtkClientSocket (0xe6a060): Socket error in call to connect. Connection 
>>> refused.
>>>
>>> ERROR: In 
>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx,
>>>  line 53
>>> vtkClientSocket (0xe6a060): Failed to connect to server localhost:11111
>>>
>>> Warning: In 
>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx,
>>>  line 250
>>> vtkTCPNetworkAccessManager (0x8356f0): Connect failed.  Retrying for 
>>> 58.9972 more seconds.
>>>
>>> mpiexec: killing job...
>>>
>>>
>>> Note the presence of only one connecting message.  Again, I apologize for 
>>> the mixup.  I spoke with our MPI guru and have confirmed that MPI appears 
>>> to be working correctly and I'm not making a mistake in how I launch 
>>> pvserver from the batch job perspective.
>>>
>>> Do you still want that output?
>>>
>>> On Nov 11, 2011, at 4:44 PM, Utkarsh Ayachit wrote:
>>>
>>>> That sounds very odd. If process_id variable is indeed correctly set
>>>> to 0 and 1 on the two processes, then how come there are two "Waiting
>>>> for client" lines printed out in the first email that you sent?
>>>>
>>>> Can you change that line cout to the following to verify that  both
>>>> processes are indeed printing out from the same time?
>>>>
>>>> cout << __LINE__ << " : Waiting for client" << endl;
>>>>
>>>> (This is in pvserver_common.h: 58)
>>>>
>>>> Utkarsh
>>>>
>>>> On Fri, Nov 11, 2011 at 6:30 PM, Cook, Rich <[email protected]> wrote:
>>>>> I posted the CMakeCache.txt.  I also have tried to step through the code 
>>>>> using TotalView and I can see it calling MPI_init() etc.  It looks like 
>>>>> one process correctly gets rank 0 and one gets rank 1 (by inspecting 
>>>>> process_id variable in RealMain())
>>>>> If I start in serial, it connects and I can view a protein molecule 
>>>>> successfully.  If I start in parallel, exactly one server tries and fails 
>>>>> to connect.  Am I supposed to give any extra arguments when starting in 
>>>>> parallel?
>>>>> This is what I'm doing:
>>>>>
>>>>> mpiexec -np 2 
>>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver
>>>>>  --use-offscreen-rendering  --reverse-connection  --client-host=localhost
>>>>>
>>>>>
>>>>>
>>>>> On Nov 11, 2011, at 11:11 AM, Utkarsh Ayachit wrote:
>>>>>
>>>>>> Can you post your CMakeCache.txt?
>>>>>>
>>>>>> Utkarsh
>>>>>>
>>>>>> On Fri, Nov 11, 2011 at 2:08 PM, Cook, Rich <[email protected]> wrote:
>>>>>>> Hi, thanks, but you are incorrect.
>>>>>>> I did set that variable and it was indeed compiled with MPI, as I said.
>>>>>>>
>>>>>>> rcook@prism127 (IMG_private): type pvserver
>>>>>>> pvserver is 
>>>>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver
>>>>>>> rcook@prism127 (IMG_private): ldd  
>>>>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver
>>>>>>>      libmpi.so.0 => /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi.so.0 
>>>>>>> (0x00002aaaaacc9000)
>>>>>>>      libopen-rte.so.0 => 
>>>>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-rte.so.0 
>>>>>>> (0x00002aaaaaf6c000)
>>>>>>>      libopen-pal.so.0 => 
>>>>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-pal.so.0 
>>>>>>> (0x00002aaaab1b7000)
>>>>>>>      libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaab434000)
>>>>>>>      libnsl.so.1 => /lib64/libnsl.so.1 (0x00002aaaab638000)
>>>>>>>      libutil.so.1 => /lib64/libutil.so.1 (0x00002aaaab850000)
>>>>>>>      libm.so.6 => /lib64/libm.so.6 (0x00002aaaaba54000)
>>>>>>>      libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaabcd7000)
>>>>>>>      libc.so.6 => /lib64/libc.so.6 (0x00002aaaabef2000)
>>>>>>>      /lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)
>>>>>>>
>>>>>>> When the pvservers are running, I can see that they are the correct 
>>>>>>> binaries, and ldd confirms they are MPI-capable.
>>>>>>>
>>>>>>> rcook@prism120 (~): ldd  
>>>>>>> /collab/usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/lib/paraview-3.12/pvserver
>>>>>>>  | grep mpi
>>>>>>>      libmpi_cxx.so.0 => 
>>>>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi_cxx.so.0 
>>>>>>> (0x00002aaab23bf000)
>>>>>>>      libmpi.so.0 => /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi.so.0 
>>>>>>> (0x00002aaab25da000)
>>>>>>>      libopen-rte.so.0 => 
>>>>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-rte.so.0 
>>>>>>> (0x00002aaab287d000)
>>>>>>>      libopen-pal.so.0 => 
>>>>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-pal.so.0 
>>>>>>> (0x00002aaab2ac7000)
>>>>>>>
>>>>>>>
>>>>>>> On Nov 11, 2011, at 11:04 AM, Utkarsh Ayachit wrote:
>>>>>>>
>>>>>>>> Your pvserver is not built with MPI enabled. Please rebuild pvserver
>>>>>>>> with CMake variable PARAVIEW_USE_MPI:BOOL=ON.
>>>>>>>>
>>>>>>>> Utkarsh
>>>>>>>>
>>>>>>>> On Fri, Nov 11, 2011 at 1:54 PM, Cook, Rich <[email protected]> wrote:
>>>>>>>>> We have a tricky firewall situation here so I have to use reverse 
>>>>>>>>> tunneling per 
>>>>>>>>> http://www.paraview.org/Wiki/Reverse_connection_and_port_forwarding#Reverse_Connection_Over_an_ssh_Tunnel
>>>>>>>>>
>>>>>>>>> I'm not sure I'm doing it right.  I can do it with a single server, 
>>>>>>>>> but when I try to run in parallel, it looks like something is broken. 
>>>>>>>>>  My understanding is that when launched under MPI, the servers should 
>>>>>>>>> talk to eachother and only one of the servers should try to connect 
>>>>>>>>> back to the client.  I compiled with MPI, and am running in an MPI 
>>>>>>>>> environment, but it looks as though the pvservers are not talking to 
>>>>>>>>> each other but are each trying to make their own connection to the 
>>>>>>>>> client.  Below is the output.  Can anyone help me get this up and 
>>>>>>>>> running?  I know I'm close.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> rcook@prism127 (IMG_private): srun -n 8 
>>>>>>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver
>>>>>>>>>  --use-offscreen-rendering  --reverse-connection  
>>>>>>>>> --client-host=localhost
>>>>>>>>> Waiting for client
>>>>>>>>> Connection URL: csrc://localhost:11111
>>>>>>>>> Client connected.
>>>>>>>>> Waiting for client
>>>>>>>>> Connection URL: csrc://localhost:11111
>>>>>>>>> ERROR: In 
>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx,
>>>>>>>>>  line 481
>>>>>>>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. 
>>>>>>>>> Connection refused.
>>>>>>>>>
>>>>>>>>> ERROR: In 
>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx,
>>>>>>>>>  line 53
>>>>>>>>> vtkClientSocket (0xd8ee20): Failed to connect to server 
>>>>>>>>> localhost:11111
>>>>>>>>>
>>>>>>>>> Warning: In 
>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx,
>>>>>>>>>  line 250
>>>>>>>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed.  Retrying for 
>>>>>>>>> 59.9994 more seconds.
>>>>>>>>>
>>>>>>>>> ERROR: In 
>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx,
>>>>>>>>>  line 481
>>>>>>>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. 
>>>>>>>>> Connection refused.
>>>>>>>>>
>>>>>>>>> ERROR: In 
>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx,
>>>>>>>>>  line 53
>>>>>>>>> vtkClientSocket (0xd8ee20): Failed to connect to server 
>>>>>>>>> localhost:11111
>>>>>>>>>
>>>>>>>>> Warning: In 
>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx,
>>>>>>>>>  line 250
>>>>>>>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed.  Retrying for 
>>>>>>>>> 58.9972 more seconds.
>>>>>>>>>
>>>>>>>>> ERROR: In 
>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx,
>>>>>>>>>  line 481
>>>>>>>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. 
>>>>>>>>> Connection refused.
>>>>>>>>>
>>>>>>>>> ERROR: In 
>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx,
>>>>>>>>>  line 53
>>>>>>>>> vtkClientSocket (0xd8ee20): Failed to connect to server 
>>>>>>>>> localhost:11111
>>>>>>>>>
>>>>>>>>> Warning: In 
>>>>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx,
>>>>>>>>>  line 250
>>>>>>>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed.  Retrying for 
>>>>>>>>> 57.9952 more seconds.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> etc. etc. etc.
>>>>>>>>> --
>>>>>>>>> ✐Richard Cook
>>>>>>>>> ✇ Lawrence Livermore National Laboratory
>>>>>>>>> Bldg-453 Rm-4024, Mail Stop L-557
>>>>>>>>> 7000 East Avenue,  Livermore, CA, 94550, USA
>>>>>>>>> ☎ (office) (925) 423-9605
>>>>>>>>> ☎ (fax) (925) 423-6961
>>>>>>>>> ---
>>>>>>>>> Information Management & Graphics Grp., Services & Development Div., 
>>>>>>>>> Integrated Computing & Communications Dept.
>>>>>>>>> (opinions expressed herein are mine and not those of LLNL)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Powered by www.kitware.com
>>>>>>>>>
>>>>>>>>> Visit other Kitware open-source projects at 
>>>>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>>>>>
>>>>>>>>> Please keep messages on-topic and check the ParaView Wiki at: 
>>>>>>>>> http://paraview.org/Wiki/ParaView
>>>>>>>>>
>>>>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>>>>> http://www.paraview.org/mailman/listinfo/paraview
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ✐Richard Cook
>>>>>>> ✇ Lawrence Livermore National Laboratory
>>>>>>> Bldg-453 Rm-4024, Mail Stop L-557
>>>>>>> 7000 East Avenue,  Livermore, CA, 94550, USA
>>>>>>> ☎ (office) (925) 423-9605
>>>>>>> ☎ (fax) (925) 423-6961
>>>>>>> ---
>>>>>>> Information Management & Graphics Grp., Services & Development Div., 
>>>>>>> Integrated Computing & Communications Dept.
>>>>>>> (opinions expressed herein are mine and not those of LLNL)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> ✐Richard Cook
>>>>> ✇ Lawrence Livermore National Laboratory
>>>>> Bldg-453 Rm-4024, Mail Stop L-557
>>>>> 7000 East Avenue,  Livermore, CA, 94550, USA
>>>>> ☎ (office) (925) 423-9605
>>>>> ☎ (fax) (925) 423-6961
>>>>> ---
>>>>> Information Management & Graphics Grp., Services & Development Div., 
>>>>> Integrated Computing & Communications Dept.
>>>>> (opinions expressed herein are mine and not those of LLNL)
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> ✐Richard Cook
>>> ✇ Lawrence Livermore National Laboratory
>>> Bldg-453 Rm-4024, Mail Stop L-557
>>> 7000 East Avenue,  Livermore, CA, 94550, USA
>>> ☎ (office) (925) 423-9605
>>> ☎ (fax) (925) 423-6961
>>> ---
>>> Information Management & Graphics Grp., Services & Development Div., 
>>> Integrated Computing & Communications Dept.
>>> (opinions expressed herein are mine and not those of LLNL)
>>>
>>>
>>>
>>
>> --
>> ✐Richard Cook
>> ✇ Lawrence Livermore National Laboratory
>> Bldg-453 Rm-4024, Mail Stop L-557
>> 7000 East Avenue,  Livermore, CA, 94550, USA
>> ☎ (office) (925) 423-9605
>> ☎ (fax) (925) 423-6961
>> ---
>> Information Management & Graphics Grp., Services & Development Div., 
>> Integrated Computing & Communications Dept.
>> (opinions expressed herein are mine and not those of LLNL)
>>
>>
>>
>>

--
✐Richard Cook
✇ Lawrence Livermore National Laboratory
Bldg-453 Rm-4024, Mail Stop L-557
7000 East Avenue,  Livermore, CA, 94550, USA
☎ (office) (925) 423-9605
☎ (fax) (925) 423-6961
---
Information Management & Graphics Grp., Services & Development Div., Integrated 
Computing & Communications Dept.
(opinions expressed herein are mine and not those of LLNL)



_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview

Reply via email to