Posting back to the mailing list to see if anyone else has any idea what's going on.
On Tue, Mar 16, 2010 at 12:08 PM, SCHROEDER, Martin <[email protected]> wrote: > Hm, i don't think it's broken, because wit works under some circumstances (on > one only host, on multiple hots with pvserver c/s-streamlogging turned on.. > > With 2 processes on two hosts and valgrind attached, it works. > even i get some messages when conenction the client to the server > ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 1. > ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 1. > ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 1. > ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2. > ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2. > ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2. > ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2. > ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2. > ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2. > ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2. > ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2. > ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2. > ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2. > ICET,1:ERROR: icetDisplayNodes: Invalid rank for tile 2 > > Aftre correcting the tiles settings for pvserver it works. > > > With 8 processes, 2 on each of 4 hosts it crashes like described before. > > > the mpirun debug output for the last try was: > > cp003158:17549] procdir: /tmp/openmpi-sessions-ya06...@cp003158_0/2374/0/1 > [cp003159:32354] procdir: /tmp/openmpi-sessions-ya06...@cp003159_0/2374/0/2 > [cp003158:17549] jobdir: /tmp/openmpi-sessions-ya06...@cp003158_0/2374/0 > [cp003158:17549] top: openmpi-sessions-ya06...@cp003158_0 > [cp003158:17549] tmp: /tmp > [cp003159:32354] jobdir: /tmp/openmpi-sessions-ya06...@cp003159_0/2374/0 > [cp003159:32354] top: openmpi-sessions-ya06...@cp003159_0 > [cp003159:32354] tmp: /tmp > [cp003162:31564] procdir: /tmp/openmpi-sessions-ya06...@cp003162_0/2374/0/3 > [cp003163:31530] procdir: /tmp/openmpi-sessions-ya06...@cp003163_0/2374/0/4 > [cp003163:31530] jobdir: /tmp/openmpi-sessions-ya06...@cp003163_0/2374/0 > [cp003163:31530] top: openmpi-sessions-ya06...@cp003163_0 > [cp003163:31530] tmp: /tmp > [cp002860:20714] [[2374,0],0] node[0].name cp002860 daemon 0 arch ffc91200 > [cp002860:20714] [[2374,0],0] node[1].name cp003158 daemon 1 arch ffc91200 > [cp002860:20714] [[2374,0],0] node[2].name cp003159 daemon 2 arch ffc91200 > [cp002860:20714] [[2374,0],0] node[3].name cp003162 daemon 3 arch ffc91200 > [cp002860:20714] [[2374,0],0] node[4].name cp003163 daemon 4 arch ffc91200 > [cp003158:17549] [[2374,0],1] node[0].name cp002860 daemon 0 arch ffc91200 > [cp003159:32354] [[2374,0],2] node[0].name cp002860 daemon 0 arch ffc91200 > [cp003158:17549] [[2374,0],1] node[1].name cp003158 daemon 1 arch ffc91200 > [cp003158:17549] [[2374,0],1] node[2].name cp003159 daemon 2 arch ffc91200 > [cp003158:17549] [[2374,0],1] node[3].name cp003162 daemon 3 arch ffc91200 > [cp003158:17549] [[2374,0],1] node[4].name cp003163 daemon 4 arch ffc91200 > [cp003159:32354] [[2374,0],2] node[1].name cp003158 daemon 1 arch ffc91200 > [cp003159:32354] [[2374,0],2] node[2].name cp003159 daemon 2 arch ffc91200 > [cp003159:32354] [[2374,0],2] node[3].name cp003162 daemon 3 arch ffc91200 > [cp003159:32354] [[2374,0],2] node[4].name cp003163 daemon 4 arch ffc91200 > [cp003162:31564] jobdir: /tmp/openmpi-sessions-ya06...@cp003162_0/2374/0 > [cp003162:31564] top: openmpi-sessions-ya06...@cp003162_0 > [cp003162:31564] tmp: /tmp > [cp003162:31564] [[2374,0],3] node[0].name cp002860 daemon 0 arch ffc91200 > [cp003162:31564] [[2374,0],3] node[1].name cp003158 daemon 1 arch ffc91200 > [cp003162:31564] [[2374,0],3] node[2].name cp003159 daemon 2 arch ffc91200 > [cp003162:31564] [[2374,0],3] node[3].name cp003162 daemon 3 arch ffc91200 > [cp003162:31564] [[2374,0],3] node[4].name cp003163 daemon 4 arch ffc91200 > [cp002860:20714] Info: Setting up debugger process table for applications > MPIR_being_debugged = 0 > MPIR_debug_state = 1 > MPIR_partial_attach_ok = 1 > MPIR_i_am_starter = 0 > MPIR_proctable_size = 8 > MPIR_proctable: > (i, host, exe, pid) = (0, cp003158, > /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 17550) > (i, host, exe, pid) = (1, cp003158, > /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 17551) > (i, host, exe, pid) = (2, cp003159, > /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 32355) > (i, host, exe, pid) = (3, cp003159, > /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 32356) > (i, host, exe, pid) = (4, cp003162, > /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 31565) > (i, host, exe, pid) = (5, cp003162, > /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 31566) > (i, host, exe, pid) = (6, cp003163, > /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 31531) > (i, host, exe, pid) = (7, cp003163, > /yaprod/freeware/Linux_x86_64/app/Paraview-3.6.2-OpenMPI/bin/xterm, 31532) > [cp003163:31530] [[2374,0],4] node[0].name cp002860 daemon 0 arch ffc91200 > [cp003163:31530] [[2374,0],4] node[1].name cp003158 daemon 1 arch ffc91200 > [cp003163:31530] [[2374,0],4] node[2].name cp003159 daemon 2 arch ffc91200 > [cp003163:31530] [[2374,0],4] node[3].name cp003162 daemon 3 arch ffc91200 > [cp003163:31530] [[2374,0],4] node[4].name cp003163 daemon 4 arch ffc91200 > [cp003158:17562] procdir: /tmp/openmpi-sessions-ya06...@cp003158_0/2374/1/1 > [cp003158:17562] jobdir: /tmp/openmpi-sessions-ya06...@cp003158_0/2374/1 > [cp003158:17562] top: openmpi-sessions-ya06...@cp003158_0 > [cp003158:17562] tmp: /tmp > [cp003158:17563] procdir: /tmp/openmpi-sessions-ya06...@cp003158_0/2374/1/0 > [cp003158:17563] jobdir: /tmp/openmpi-sessions-ya06...@cp003158_0/2374/1 > [cp003158:17563] top: openmpi-sessions-ya06...@cp003158_0 > [cp003158:17563] tmp: /tmp > [cp003158:17562] [[2374,1],1] node[0].name cp002860 daemon 0 arch ffc91200 > [cp003158:17562] [[2374,1],1] node[1].name cp003158 daemon 1 arch ffc91200 > [cp003158:17562] [[2374,1],1] node[2].name cp003159 daemon 2 arch ffc91200 > [cp003158:17562] [[2374,1],1] node[3].name cp003162 daemon 3 arch ffc91200 > [cp003158:17562] [[2374,1],1] node[4].name cp003163 daemon 4 arch ffc91200 > [cp003158:17563] [[2374,1],0] node[0].name cp002860 daemon 0 arch ffc91200 > [cp003158:17563] [[2374,1],0] node[1].name cp003158 daemon 1 arch ffc91200 > [cp003158:17563] [[2374,1],0] node[2].name cp003159 daemon 2 arch ffc91200 > [cp003158:17563] [[2374,1],0] node[3].name cp003162 daemon 3 arch ffc91200 > [cp003158:17563] [[2374,1],0] node[4].name cp003163 daemon 4 arch ffc91200 > [cp003159:32370] procdir: /tmp/openmpi-sessions-ya06...@cp003159_0/2374/1/3 > [cp003159:32370] jobdir: /tmp/openmpi-sessions-ya06...@cp003159_0/2374/1 > [cp003159:32370] top: openmpi-sessions-ya06...@cp003159_0 > [cp003159:32370] tmp: /tmp > [cp003159:32370] [[2374,1],3] node[0].name cp002860 daemon 0 arch ffc91200 > [cp003159:32370] [[2374,1],3] node[1].name cp003158 daemon 1 arch ffc91200 > [cp003159:32370] [[2374,1],3] node[2].name cp003159 daemon 2 arch ffc91200 > [cp003159:32370] [[2374,1],3] node[3].name cp003162 daemon 3 arch ffc91200 > [cp003159:32370] [[2374,1],3] node[4].name cp003163 daemon 4 arch ffc91200 > [cp003159:32369] procdir: /tmp/openmpi-sessions-ya06...@cp003159_0/2374/1/2 > [cp003159:32369] jobdir: /tmp/openmpi-sessions-ya06...@cp003159_0/2374/1 > [cp003159:32369] top: openmpi-sessions-ya06...@cp003159_0 > [cp003159:32369] tmp: /tmp > [cp003159:32369] [[2374,1],2] node[0].name cp002860 daemon 0 arch ffc91200 > [cp003159:32369] [[2374,1],2] node[1].name cp003158 daemon 1 arch ffc91200 > [cp003159:32369] [[2374,1],2] node[2].name cp003159 daemon 2 arch ffc91200 > [cp003159:32369] [[2374,1],2] node[3].name cp003162 daemon 3 arch ffc91200 > [cp003159:32369] [[2374,1],2] node[4].name cp003163 daemon 4 arch ffc91200 > [cp003162:31580] procdir: /tmp/openmpi-sessions-ya06...@cp003162_0/2374/1/4 > [cp003162:31579] procdir: /tmp/openmpi-sessions-ya06...@cp003162_0/2374/1/5 > [cp003162:31579] jobdir: /tmp/openmpi-sessions-ya06...@cp003162_0/2374/1 > [cp003162:31579] top: openmpi-sessions-ya06...@cp003162_0 > [cp003162:31579] tmp: /tmp > [cp003162:31580] jobdir: /tmp/openmpi-sessions-ya06...@cp003162_0/2374/1 > [cp003162:31580] top: openmpi-sessions-ya06...@cp003162_0 > [cp003162:31580] tmp: /tmp > [cp003162:31579] [[2374,1],5] node[0].name cp002860 daemon 0 arch ffc91200 > [cp003162:31579] [[2374,1],5] node[1].name cp003158 daemon 1 arch ffc91200 > [cp003162:31579] [[2374,1],5] node[2].name cp003159 daemon 2 arch ffc91200 > [cp003162:31579] [[2374,1],5] node[3].name cp003162 daemon 3 arch ffc91200 > [cp003162:31579] [[2374,1],5] node[4].name cp003163 daemon 4 arch ffc91200 > [cp003162:31580] [[2374,1],4] node[0].name cp002860 daemon 0 arch ffc91200 > [cp003162:31580] [[2374,1],4] node[1].name cp003158 daemon 1 arch ffc91200 > [cp003162:31580] [[2374,1],4] node[2].name cp003159 daemon 2 arch ffc91200 > [cp003162:31580] [[2374,1],4] node[3].name cp003162 daemon 3 arch ffc91200 > [cp003162:31580] [[2374,1],4] node[4].name cp003163 daemon 4 arch ffc91200 > [cp003163:31545] procdir: /tmp/openmpi-sessions-ya06...@cp003163_0/2374/1/6 > [cp003163:31546] procdir: /tmp/openmpi-sessions-ya06...@cp003163_0/2374/1/7 > [cp003163:31546] jobdir: /tmp/openmpi-sessions-ya06...@cp003163_0/2374/1 > [cp003163:31546] top: openmpi-sessions-ya06...@cp003163_0 > [cp003163:31546] tmp: /tmp > [cp003163:31545] jobdir: /tmp/openmpi-sessions-ya06...@cp003163_0/2374/1 > [cp003163:31545] top: openmpi-sessions-ya06...@cp003163_0 > [cp003163:31545] tmp: /tmp > [cp003163:31545] [[2374,1],6] node[0].name cp002860 daemon 0 arch ffc91200 > [cp003163:31545] [[2374,1],6] node[1].name cp003158 daemon 1 arch ffc91200 > [cp003163:31545] [[2374,1],6] node[2].name cp003159 daemon 2 arch ffc91200 > [cp003163:31545] [[2374,1],6] node[3].name cp003162 daemon 3 arch ffc91200 > [cp003163:31545] [[2374,1],6] node[4].name cp003163 daemon 4 arch ffc91200 > [cp003163:31546] [[2374,1],7] node[0].name cp002860 daemon 0 arch ffc91200 > [cp003163:31546] [[2374,1],7] node[1].name cp003158 daemon 1 arch ffc91200 > [cp003163:31546] [[2374,1],7] node[2].name cp003159 daemon 2 arch ffc91200 > [cp003163:31546] [[2374,1],7] node[3].name cp003162 daemon 3 arch ffc91200 > [cp003163:31546] [[2374,1],7] node[4].name cp003163 daemon 4 arch ffc91200 > [cp003163:31530] sess_dir_finalize: proc session dir not empty - leaving > -------------------------------------------------------------------------- > mpirun has exited due to process rank 6 with PID 31531 on > node cp003163 exiting without calling "finalize". This may > have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > -------------------------------------------------------------------------- > [cp003163:31530] sess_dir_finalize: job session dir not empty - leaving > [cp003162:31564] sess_dir_finalize: proc session dir not empty - leaving > [cp003159:32354] sess_dir_finalize: job session dir not empty - leaving > [cp003158:17549] sess_dir_finalize: job session dir not empty - leaving > [cp002860:20714] sess_dir_finalize: job session dir not empty - leaving > [cp002860:20714] sess_dir_finalize: proc session dir not empty - leaving > orterun: exiting with status 1 > > martin > > -----Ursprüngliche Nachricht----- > Von: Utkarsh Ayachit [mailto:[email protected]] > Gesendet: Dienstag, 16. März 2010 15:00 > An: SCHROEDER, Martin > Cc: ParaView > Betreff: Re: [Paraview] Paraview 3.6.2 / Open MPI 1.4.1: Server Connection > Closed! / Server failed to gather information./cslog > > I am not sure why that could be the case. The only thing that happens on > setting cslog is that each the server process starts writing out an output > log file. Also I am not sure why mpi would hang on attaching a debugger. Try > debugging by just running 2 processes. Is it possible you have a broken MPI? > > Utkarsh > > > > On Tue, Mar 16, 2010 at 9:54 AM, SCHROEDER, Martin <[email protected]> > wrote: >> hm debugging seems more difficult than i thought. mpirun ssem to hang when >> the debugging opeion is set. >> i also wonder why this "connection reset by peer" problem doesn't occur when >> the option "--cslog=somefile" is set... >> >> >> -----Ursprüngliche Nachricht----- >> Von: SCHROEDER, Martin >> Gesendet: Montag, 15. März 2010 14:33 >> An: 'Utkarsh Ayachit' >> Betreff: AW: [Paraview] Paraview 3.6.2 / Open MPI 1.4.1: Server >> Connection Closed! / Server failed to gather information./cslog >> >> Yes it is possible. I'will try to and send you the output. >> Meanwhile, mpirun sometimes brought back this message: >> >> btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv >> failed: Connection reset by peer (104) >> >> >> >> -----Ursprüngliche Nachricht----- >> Von: Utkarsh Ayachit [mailto:[email protected]] >> Gesendet: Freitag, 12. März 2010 15:41 >> An: SCHROEDER, Martin >> Cc: [email protected] >> Betreff: Re: [Paraview] Paraview 3.6.2 / Open MPI 1.4.1: Server >> Connection Closed! / Server failed to gather information./cslog >> >> Is it possible to attach a debugger to the server processes and see where it >> crashes? >> >> On Fri, Mar 12, 2010 at 7:03 AM, SCHROEDER, Martin <[email protected]> >> wrote: >>> Hello >>> when I'm trying to run paraview (pvserver) on a single host using >>> mpirun with 4 -8 processes, it works. >>> The problem is : >>> when i'm trying to spread pvserver over multiple hosts, using mpirun >>> and a hostfile, the server processes and the client crash when I >>> connect the client to the server. >>> >>> Im'getting these messages in the client's shell: >>> >>> ERROR: In >>> /yatest/cae/src/Paraview3.6.2/ParaView3/Servers/Common/vtkServerConne >>> c >>> tion.cxx, >>> line 67 >>> vtkServerConnection (0x1140c30): Server Connection Closed! >>> >>> ERROR: In >>> /yatest/cae/src/Paraview3.6.2/ParaView3/Servers/Common/vtkServerConne >>> c >>> tion.cxx, >>> line 345 >>> vtkServerConnection (0x1140c30): Server failed to gather information. >>> >>> If I use the option cslog=/home/.../cstream.log when executing >>> pvserver, it works slowly, but it works on two hosts with 4 processes on >>> each host. >>> >>> Paraview Client and Server are the same Version 3.6.2 Open MPI is >>> 1.4.1 >>> >>> Has anyone experienced the same ? >>> Any hint would be great. >>> >>> Mit freundlichen Gruessen / Best regards >>> >>> Martin Schröder, FIEA >>> MTU Aero Engines GmbH >>> Engineering Systems (CAE) >>> Dachauer Str. 665 >>> 80995 Muenchen >>> Germany >>> >>> Tel +49 (0)89 14 89 57 20 >>> Fax +49 (0)89 14 89-96 89 4 >>> mailto:[email protected] >>> http://www.mtu.de >>> >>> >>> >>> -- >>> MTU Aero Engines GmbH >>> Geschaeftsfuehrung/Board of Management: Egon W. Behle, Vorsitzender/CEO; Dr. >>> Rainer Martens, Dr. Stefan Weingartner, Reiner Winkler Vorsitzender >>> des Aufsichtsrats/Chairman of the Supervisory Board: Klaus Eberhardt >>> Sitz der Gesellschaft/Registered Office: Muenchen >>> Handelsregister/Commercial Register: Muenchen HRB 154230 >>> >>> Diese E-Mail sowie ihre Anhänge enthalten MTU-eigene vertrauliche >>> oder rechtlich geschützte Informationen. >>> Wenn Sie nicht der beabsichtigte Empfänger sind, informieren Sie >>> bitte den Absender und löschen Sie diese E-Mail sowie die Anhänge. >>> Das unbefugte Speichern, Kopieren oder Weiterleiten ist nicht gestattet. >>> >>> This e-mail and any attached documents are proprietary to MTU, >>> confidential or protected by law. >>> If you are not the intended recipient, please advise the sender and >>> delete this message and its attachments. >>> Any unauthorised storing, copying or distribution is prohibited. >>> >>> >>> _______________________________________________ >>> Powered by www.kitware.com >>> >>> Visit other Kitware open-source projects at >>> http://www.kitware.com/opensource/opensource.html >>> >>> Please keep messages on-topic and check the ParaView Wiki at: >>> http://paraview.org/Wiki/ParaView >>> >>> Follow this link to subscribe/unsubscribe: >>> http://www.paraview.org/mailman/listinfo/paraview >>> >>> >> -- >> MTU Aero Engines GmbH >> Geschaeftsfuehrung/Board of Management: Egon W. Behle, >> Vorsitzender/CEO; Dr. Rainer Martens, Dr. Stefan Weingartner, Reiner >> Winkler Vorsitzender des Aufsichtsrats/Chairman of the Supervisory >> Board: Klaus Eberhardt Sitz der Gesellschaft/Registered Office: >> Muenchen Handelsregister/Commercial Register: Muenchen HRB 154230 >> >> Diese E-Mail sowie ihre Anhaenge enthalten MTU-eigene vertrauliche oder >> rechtlich geschuetzte Informationen. >> Wenn Sie nicht der beabsichtigte Empfaenger sind, informieren Sie >> bitte den Absender und loeschen Sie diese E-Mail sowie die Anhaenge. Das >> unbefugte Speichern, Kopieren oder Weiterleiten ist nicht gestattet. >> >> This e-mail and any attached documents are proprietary to MTU, confidential >> or protected by law. >> If you are not the intended recipient, please advise the sender and delete >> this message and its attachments. >> Any unauthorised storing, copying or distribution is prohibited. >> >> > -- > MTU Aero Engines GmbH > Geschaeftsfuehrung/Board of Management: Egon W. Behle, Vorsitzender/CEO; Dr. > Rainer Martens, Dr. Stefan Weingartner, Reiner Winkler > Vorsitzender des Aufsichtsrats/Chairman of the Supervisory Board: Klaus > Eberhardt > Sitz der Gesellschaft/Registered Office: Muenchen > Handelsregister/Commercial Register: Muenchen HRB 154230 > > Diese E-Mail sowie ihre Anhaenge enthalten MTU-eigene vertrauliche oder > rechtlich geschuetzte Informationen. > Wenn Sie nicht der beabsichtigte Empfaenger sind, informieren Sie bitte den > Absender und loeschen Sie diese E-Mail > sowie die Anhaenge. Das unbefugte Speichern, Kopieren oder Weiterleiten ist > nicht gestattet. > > This e-mail and any attached documents are proprietary to MTU, confidential > or protected by law. > If you are not the intended recipient, please advise the sender and delete > this message and its attachments. > Any unauthorised storing, copying or distribution is prohibited. > > _______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView Follow this link to subscribe/unsubscribe: http://www.paraview.org/mailman/listinfo/paraview
