My bad.  
The first email I sent I was using the wrong MPI (srun instead of mpiexec -- 
mvapich instead of openmpi).  So both processes were indeed getting set to the 
same process ID.  Please ignore that output.  
The current output looks like this:  

rcook@prism127 (~): mpiexec -np 2 
/usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver 
--use-offscreen-rendering  --reverse-connection  --client-host=localhost 
Waiting for client
Connection URL: csrc://localhost:11111
ERROR: In 
/nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, line 
481
vtkClientSocket (0xe6a060): Socket error in call to connect. Connection refused.

ERROR: In 
/nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, 
line 53
vtkClientSocket (0xe6a060): Failed to connect to server localhost:11111

Warning: In 
/nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx,
 line 250
vtkTCPNetworkAccessManager (0x8356f0): Connect failed.  Retrying for 59.9993 
more seconds.

ERROR: In 
/nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx, line 
481
vtkClientSocket (0xe6a060): Socket error in call to connect. Connection refused.

ERROR: In 
/nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx, 
line 53
vtkClientSocket (0xe6a060): Failed to connect to server localhost:11111

Warning: In 
/nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx,
 line 250
vtkTCPNetworkAccessManager (0x8356f0): Connect failed.  Retrying for 58.9972 
more seconds.

mpiexec: killing job...


Note the presence of only one connecting message.  Again, I apologize for the 
mixup.  I spoke with our MPI guru and have confirmed that MPI appears to be 
working correctly and I'm not making a mistake in how I launch pvserver from 
the batch job perspective.  

Do you still want that output?  

On Nov 11, 2011, at 4:44 PM, Utkarsh Ayachit wrote:

> That sounds very odd. If process_id variable is indeed correctly set
> to 0 and 1 on the two processes, then how come there are two "Waiting
> for client" lines printed out in the first email that you sent?
> 
> Can you change that line cout to the following to verify that  both
> processes are indeed printing out from the same time?
> 
> cout << __LINE__ << " : Waiting for client" << endl;
> 
> (This is in pvserver_common.h: 58)
> 
> Utkarsh
> 
> On Fri, Nov 11, 2011 at 6:30 PM, Cook, Rich <[email protected]> wrote:
>> I posted the CMakeCache.txt.  I also have tried to step through the code 
>> using TotalView and I can see it calling MPI_init() etc.  It looks like one 
>> process correctly gets rank 0 and one gets rank 1 (by inspecting process_id 
>> variable in RealMain())
>> If I start in serial, it connects and I can view a protein molecule 
>> successfully.  If I start in parallel, exactly one server tries and fails to 
>> connect.  Am I supposed to give any extra arguments when starting in 
>> parallel?
>> This is what I'm doing:
>> 
>>  mpiexec -np 2 
>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver
>>  --use-offscreen-rendering  --reverse-connection  --client-host=localhost
>> 
>> 
>> 
>> On Nov 11, 2011, at 11:11 AM, Utkarsh Ayachit wrote:
>> 
>>> Can you post your CMakeCache.txt?
>>> 
>>> Utkarsh
>>> 
>>> On Fri, Nov 11, 2011 at 2:08 PM, Cook, Rich <[email protected]> wrote:
>>>> Hi, thanks, but you are incorrect.
>>>> I did set that variable and it was indeed compiled with MPI, as I said.
>>>> 
>>>> rcook@prism127 (IMG_private): type pvserver
>>>> pvserver is 
>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver
>>>> rcook@prism127 (IMG_private): ldd  
>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver
>>>>        libmpi.so.0 => /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi.so.0 
>>>> (0x00002aaaaacc9000)
>>>>        libopen-rte.so.0 => 
>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-rte.so.0 
>>>> (0x00002aaaaaf6c000)
>>>>        libopen-pal.so.0 => 
>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-pal.so.0 
>>>> (0x00002aaaab1b7000)
>>>>        libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaab434000)
>>>>        libnsl.so.1 => /lib64/libnsl.so.1 (0x00002aaaab638000)
>>>>        libutil.so.1 => /lib64/libutil.so.1 (0x00002aaaab850000)
>>>>        libm.so.6 => /lib64/libm.so.6 (0x00002aaaaba54000)
>>>>        libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaabcd7000)
>>>>        libc.so.6 => /lib64/libc.so.6 (0x00002aaaabef2000)
>>>>        /lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)
>>>> 
>>>> When the pvservers are running, I can see that they are the correct 
>>>> binaries, and ldd confirms they are MPI-capable.
>>>> 
>>>> rcook@prism120 (~): ldd  
>>>> /collab/usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/lib/paraview-3.12/pvserver
>>>>  | grep mpi
>>>>        libmpi_cxx.so.0 => 
>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi_cxx.so.0 (0x00002aaab23bf000)
>>>>        libmpi.so.0 => /usr/local/tools/openmpi-gnu-1.4.3/lib/libmpi.so.0 
>>>> (0x00002aaab25da000)
>>>>        libopen-rte.so.0 => 
>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-rte.so.0 
>>>> (0x00002aaab287d000)
>>>>        libopen-pal.so.0 => 
>>>> /usr/local/tools/openmpi-gnu-1.4.3/lib/libopen-pal.so.0 
>>>> (0x00002aaab2ac7000)
>>>> 
>>>> 
>>>> On Nov 11, 2011, at 11:04 AM, Utkarsh Ayachit wrote:
>>>> 
>>>>> Your pvserver is not built with MPI enabled. Please rebuild pvserver
>>>>> with CMake variable PARAVIEW_USE_MPI:BOOL=ON.
>>>>> 
>>>>> Utkarsh
>>>>> 
>>>>> On Fri, Nov 11, 2011 at 1:54 PM, Cook, Rich <[email protected]> wrote:
>>>>>> We have a tricky firewall situation here so I have to use reverse 
>>>>>> tunneling per 
>>>>>> http://www.paraview.org/Wiki/Reverse_connection_and_port_forwarding#Reverse_Connection_Over_an_ssh_Tunnel
>>>>>> 
>>>>>> I'm not sure I'm doing it right.  I can do it with a single server, but 
>>>>>> when I try to run in parallel, it looks like something is broken.  My 
>>>>>> understanding is that when launched under MPI, the servers should talk 
>>>>>> to eachother and only one of the servers should try to connect back to 
>>>>>> the client.  I compiled with MPI, and am running in an MPI environment, 
>>>>>> but it looks as though the pvservers are not talking to each other but 
>>>>>> are each trying to make their own connection to the client.  Below is 
>>>>>> the output.  Can anyone help me get this up and running?  I know I'm 
>>>>>> close.
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> rcook@prism127 (IMG_private): srun -n 8 
>>>>>> /usr/global/tools/Kitware/Paraview/3.12.0-OSMesa/chaos_4_x86_64_ib/bin/pvserver
>>>>>>  --use-offscreen-rendering  --reverse-connection  --client-host=localhost
>>>>>> Waiting for client
>>>>>> Connection URL: csrc://localhost:11111
>>>>>> Client connected.
>>>>>> Waiting for client
>>>>>> Connection URL: csrc://localhost:11111
>>>>>> ERROR: In 
>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx,
>>>>>>  line 481
>>>>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. Connection 
>>>>>> refused.
>>>>>> 
>>>>>> ERROR: In 
>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx,
>>>>>>  line 53
>>>>>> vtkClientSocket (0xd8ee20): Failed to connect to server localhost:11111
>>>>>> 
>>>>>> Warning: In 
>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx,
>>>>>>  line 250
>>>>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed.  Retrying for 
>>>>>> 59.9994 more seconds.
>>>>>> 
>>>>>> ERROR: In 
>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx,
>>>>>>  line 481
>>>>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. Connection 
>>>>>> refused.
>>>>>> 
>>>>>> ERROR: In 
>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx,
>>>>>>  line 53
>>>>>> vtkClientSocket (0xd8ee20): Failed to connect to server localhost:11111
>>>>>> 
>>>>>> Warning: In 
>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx,
>>>>>>  line 250
>>>>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed.  Retrying for 
>>>>>> 58.9972 more seconds.
>>>>>> 
>>>>>> ERROR: In 
>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkSocket.cxx,
>>>>>>  line 481
>>>>>> vtkClientSocket (0xd8ee20): Socket error in call to connect. Connection 
>>>>>> refused.
>>>>>> 
>>>>>> ERROR: In 
>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/VTK/Common/vtkClientSocket.cxx,
>>>>>>  line 53
>>>>>> vtkClientSocket (0xd8ee20): Failed to connect to server localhost:11111
>>>>>> 
>>>>>> Warning: In 
>>>>>> /nfs/tmp2/rcook/ParaView/3.12.0/ParaView-3.12.0/ParaViewCore/ClientServerCore/vtkTCPNetworkAccessManager.cxx,
>>>>>>  line 250
>>>>>> vtkTCPNetworkAccessManager (0x6619a0): Connect failed.  Retrying for 
>>>>>> 57.9952 more seconds.
>>>>>> 
>>>>>> 
>>>>>> etc. etc. etc.
>>>>>> --
>>>>>> ✐Richard Cook
>>>>>> ✇ Lawrence Livermore National Laboratory
>>>>>> Bldg-453 Rm-4024, Mail Stop L-557
>>>>>> 7000 East Avenue,  Livermore, CA, 94550, USA
>>>>>> ☎ (office) (925) 423-9605
>>>>>> ☎ (fax) (925) 423-6961
>>>>>> ---
>>>>>> Information Management & Graphics Grp., Services & Development Div., 
>>>>>> Integrated Computing & Communications Dept.
>>>>>> (opinions expressed herein are mine and not those of LLNL)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Powered by www.kitware.com
>>>>>> 
>>>>>> Visit other Kitware open-source projects at 
>>>>>> http://www.kitware.com/opensource/opensource.html
>>>>>> 
>>>>>> Please keep messages on-topic and check the ParaView Wiki at: 
>>>>>> http://paraview.org/Wiki/ParaView
>>>>>> 
>>>>>> Follow this link to subscribe/unsubscribe:
>>>>>> http://www.paraview.org/mailman/listinfo/paraview
>>>>>> 
>>>> 
>>>> --
>>>> ✐Richard Cook
>>>> ✇ Lawrence Livermore National Laboratory
>>>> Bldg-453 Rm-4024, Mail Stop L-557
>>>> 7000 East Avenue,  Livermore, CA, 94550, USA
>>>> ☎ (office) (925) 423-9605
>>>> ☎ (fax) (925) 423-6961
>>>> ---
>>>> Information Management & Graphics Grp., Services & Development Div., 
>>>> Integrated Computing & Communications Dept.
>>>> (opinions expressed herein are mine and not those of LLNL)
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> --
>> ✐Richard Cook
>> ✇ Lawrence Livermore National Laboratory
>> Bldg-453 Rm-4024, Mail Stop L-557
>> 7000 East Avenue,  Livermore, CA, 94550, USA
>> ☎ (office) (925) 423-9605
>> ☎ (fax) (925) 423-6961
>> ---
>> Information Management & Graphics Grp., Services & Development Div., 
>> Integrated Computing & Communications Dept.
>> (opinions expressed herein are mine and not those of LLNL)
>> 
>> 
>> 
>> 

-- 
✐Richard Cook   
✇ Lawrence Livermore National Laboratory
Bldg-453 Rm-4024, Mail Stop L-557        
7000 East Avenue,  Livermore, CA, 94550, USA
☎ (office) (925) 423-9605    
☎ (fax) (925) 423-6961
---
Information Management & Graphics Grp., Services & Development Div., Integrated 
Computing & Communications Dept.
(opinions expressed herein are mine and not those of LLNL)



_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview

Reply via email to