Re: [OMPI users] Abort/ Deadlock issue in allreduce

Christof Koehler Wed, 07 Dec 2016 05:20:35 -0800

Hello again,

attaching the gdb to mpirun the back trace when it hangs is
(gdb) bt
#0  0x00002b039f74169d in poll () from /usr/lib64/libc.so.6
#1  0x00002b039e1a9c42 in poll_dispatch () from 
/cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
#2  0x00002b039e1a2751 in opal_libevent2022_event_base_loop () from 
/cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
#3  0x00000000004056ef in orterun (argc=13, argv=0x7ffef20a79f8) at 
orterun.c:1057
#4  0x00000000004035a0 in main (argc=13, argv=0x7ffef20a79f8) at main.c:13


Using pstack on mpirun I see several threads, below

Thread 5 (Thread 0x2b03a33b0700 (LWP 11691)):
#0  0x00002b039f743413 in select () from /usr/lib64/libc.so.6
#1  0x00002b039c599979 in listen_thread () from 
/cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-rte.so.20
#2  0x00002b039defedc5 in start_thread () from /usr/lib64/libpthread.so.0
#3  0x00002b039f74bced in clone () from /usr/lib64/libc.so.6
Thread 4 (Thread 0x2b03a3be9700 (LWP 11692)):
#0  0x00002b039f74c2c3 in epoll_wait () from /usr/lib64/libc.so.6
#1  0x00002b039e1a0f42 in epoll_dispatch () from 
/cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
#2  0x00002b039e1a2751 in opal_libevent2022_event_base_loop () from 
/cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
#3  0x00002b039e1fa996 in progress_engine () from 
/cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
#4  0x00002b039defedc5 in start_thread () from /usr/lib64/libpthread.so.0
#5  0x00002b039f74bced in clone () from /usr/lib64/libc.so.6
Thread 3 (Thread 0x2b03a3dea700 (LWP 11693)):
#0  0x00002b039f743413 in select () from /usr/lib64/libc.so.6
#1  0x00002b039e1f3a5f in listen_thread () from 
/cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
#2  0x00002b039defedc5 in start_thread () from /usr/lib64/libpthread.so.0
#3  0x00002b039f74bced in clone () from /usr/lib64/libc.so.6
Thread 2 (Thread 0x2b03a3feb700 (LWP 11694)):
#0  0x00002b039f743413 in select () from /usr/lib64/libc.so.6
#1  0x00002b039c55616b in listen_thread_fn () from 
/cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-rte.so.20
#2  0x00002b039defedc5 in start_thread () from /usr/lib64/libpthread.so.0
#3  0x00002b039f74bced in clone () from /usr/lib64/libc.so.6
Thread 1 (Thread 0x2b039c324100 (LWP 11690)):
#0  0x00002b039f74169d in poll () from /usr/lib64/libc.so.6
#1  0x00002b039e1a9c42 in poll_dispatch () from 
/cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
#2  0x00002b039e1a2751 in opal_libevent2022_event_base_loop () from 
/cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
#3  0x00000000004056ef in orterun (argc=13, argv=0x7ffef20a79f8) at 
orterun.c:1057
#4  0x00000000004035a0 in main (argc=13, argv=0x7ffef20a79f8) at main.c:13

Best Regards

Christof



On Wed, Dec 07, 2016 at 02:07:27PM +0100, Christof Koehler wrote:
> Hello,
> 
> thank you for the fast answer.
> 
> On Wed, Dec 07, 2016 at 08:23:43PM +0900, Gilles Gouaillardet wrote:
> > Christoph,
> > 
> > can you please try again with
> > 
> > mpirun --mca btl tcp,self --mca pml ob1 ...
> 
> mpirun -n 20 --mca btl tcp,self --mca pml ob1 
> /cluster/vasp/5.3.5/intel2016/openmpi-2.0/bin/vasp-mpi
> 
> Deadlocks/ hangs, has no effect.
> 
> > mpirun --mca btl tcp,self --mca pml ob1 --mca coll ^tuned ...
> mpirun -n 20 --mca btl tcp,self --mca pml ob1 --mca coll ^tuned 
> /cluster/vasp/5.3.5/intel2016/openmpi-2.0/bin/vasp-mpi
> 
> Deadlocks/ hangs, has no effect. There is additional output.
> 
> wannier90 error: examine the output/error file for details
> [node109][[55572,1],16][btl_tcp_frag.c:230:mca_btl_tcp_frag_recv] 
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer
> (104)[node109][[55572,1],8][btl_tcp_frag.c:230:mca_btl_tcp_frag_recv] 
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> [node109][[55572,1],4][btl_tcp_frag.c:230:mca_btl_tcp_frag_recv] 
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> [node109][[55572,1],1][btl_tcp_frag.c:230:mca_btl_tcp_frag_recv] 
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> [node109][[55572,1],2][btl_tcp_frag.c:230:mca_btl_tcp_frag_recv] 
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> 
> Please note: The "wannier90 error: examine the output/error file for
> details" is expected, there is in fact an error in the input file. It
> is supposed to terminate.
> 
> However, with mvapich2 and openmpi 1.10.4 it terminates
> completely, i.e. I get my shell prompt back. If a segfault is involved with 
> mvapich2 (as is apparently the case with openmpi 1.10.4 based in the
> termination message) I do not know. I tried
> 
> export MV2_DEBUG_SHOW_BACKTRACE=1
> mpirun -n 20  /cluster/vasp/5.3.5/intel2016/mvapich2-2.2/bin/vasp-mpi
> 
> but did not get any indication of a problem (segfault), the last lines
> are
> 
>  calculate QP shifts <psi_nk| G(iteration)W_0 |psi_nk>: iteration 1
>  writing wavefunctions
> wannier90 error: examine the output/error file for details
> node109 14:00 /scratch/ckoe/gw %
> 
> The last line is my shell prompt.
> 
> > 
> > if everything fails, can you describe of MPI_Allreduce is invoked ?
> > /* number of tasks, datatype, number of elements */
> Difficult, this is not our code in the first place [1] and the problem
> occurs when using an ("officially" supported) third party library [2].
> 
> From the stack trace of the hanging process the vasp routine which calls
> allreduce is "m_sum_i_". That is in the mpi.F source file. Allreduce is
> called as
> 
> CALL MPI_ALLREDUCE( MPI_IN_PLACE, ivec(1), n, MPI_INTEGER, &
>          &                MPI_SUM, COMM%MPI_COMM, ierror )
> 
> n and ivec(1) are data type integer. It was originally with 20 ranks, I
> tried 2 ranks now also and it hangs, too. With one (!) rank
> 
> mpirun -n 1 --mca btl tcp,self --mca pml ob1 --mca coll ^tuned 
> /cluster/vasp/5.3.5/intel2016/openmpi-2.0/bin/vasp-mpi
> 
> I of course get a shell prompt back. 
> 
> I then started in normally in the shell with 2 ranks 
> mpirun -n 2 --mca btl tcp,self --mca pml ob1 --mca coll ^tuned 
> /cluster/vasp/5.3.5/intel2016/openmpi-2.0/bin/vasp-mpi
> and attached gdb to the rank with the lowest pid (3478). I do not get a 
> prompt 
> back (it hangs), the second rank 3479 is still at 100 % CPU and mpirun is 
> still a process
> I can see with "ps", but gdb says
> (gdb) continue     <- that is where I attached it !
> Continuing.
> [Thread 0x2b8366806700 (LWP 3480) exited]
> [Thread 0x2b835da1c040 (LWP 3478) exited]
> [Inferior 1 (process 3478) exited normally]
> (gdb) bt
> No stack.
> 
> So, as far as gdb is concerned the rank with the lowest pid (which is
> gone while the other rank is still eating CPU time) terminated normally
> ? 
> 
> I hope this helps. I have only very basic experience with debuggers
> (never needed them really) and even less with using them in parallel.
> I can try to catch the contents of ivec, but I do not think that would
> be helpful ? If you need them I can try of course, I have no idea hwo
> large the vector is.
> 
> 
> Best Regards
> 
> Christof
> 
> [1] https://www.vasp.at/
> [2] http://www.wannier.org/, Old version 1.2
> > 
> > 
> > 
> > Cheers,
> > 
> > Gilles
> > 
> > On Wed, Dec 7, 2016 at 7:38 PM, Christof Koehler
> > <christof.koeh...@bccms.uni-bremen.de> wrote:
> > > Hello everybody,
> > >
> > > I am observing a deadlock in allreduce with openmpi 2.0.1 on a Single
> > > node. A stack tracke (pstack) of one rank is below showing the program 
> > > (vasp
> > > 5.3.5) and the two psm2 progress threads. However:
> > >
> > > In fact, the vasp input is not ok and it should abort at the point where
> > > it hangs. It does when using mvapich 2.2. With openmpi 2.0.1 it just
> > > deadlocks in some allreduce operation. Originally it was started with 20
> > > ranks, when it hangs there are only 19 left. From the PIDs I would
> > > assume it is the master rank which is missing. So, this looks like a
> > > failure to terminate.
> > >
> > > With 1.10 I get a clean
> > > --------------------------------------------------------------------------
> > > mpiexec noticed that process rank 0 with PID 18789 on node node109
> > > exited on signal 11 (Segmentation fault).
> > > --------------------------------------------------------------------------
> > >
> > > Any ideas what to try ? Of course in this situation it may well be the
> > > program. Still, with the observed difference between 2.0.1 and 1.10 (and
> > > mvapich) this might be interesting to someone.
> > >
> > > Best Regards
> > >
> > > Christof
> > >
> > >
> > > Thread 3 (Thread 0x2ad362577700 (LWP 4629)):
> > > #0  0x00002ad35b1562c3 in epoll_wait () from /lib64/libc.so.6
> > > #1  0x00002ad35d114f42 in epoll_dispatch () from 
> > > /cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
> > > #2  0x00002ad35d116751 in opal_libevent2022_event_base_loop () from 
> > > /cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
> > > #3  0x00002ad35d16e996 in progress_engine () from 
> > > /cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
> > > #4  0x00002ad359efbdc5 in start_thread () from /lib64/libpthread.so.0
> > > #5  0x00002ad35b155ced in clone () from /lib64/libc.so.6
> > > Thread 2 (Thread 0x2ad362778700 (LWP 4640)):
> > > #0  0x00002ad35b14b69d in poll () from /lib64/libc.so.6 #1  
> > > 0x00002ad35d11dc42 in poll_dispatch () from
> > > /cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
> > > #2  0x00002ad35d116751 in opal_libevent2022_event_base_loop () from 
> > > /cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
> > > #3  0x00002ad35d0c61d1 in progress_engine () from 
> > > /cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
> > > #4  0x00002ad359efbdc5 in start_thread () from /lib64/libpthread.so.0
> > > #5  0x00002ad35b155ced in clone () from /lib64/libc.so.6
> > > Thread 1 (Thread 0x2ad35978d040 (LWP 4609)):
> > > #0  0x00002ad35b14b69d in poll () from /lib64/libc.so.6
> > > #1  0x00002ad35d11dc42 in poll_dispatch () from 
> > > /cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
> > > #2  0x00002ad35d116751 in opal_libevent2022_event_base_loop () from 
> > > /cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
> > > #3  0x00002ad35d0c28cf in opal_progress () from 
> > > /cluster/mpi/openmpi/2.0.1/intel2016/lib/libopen-pal.so.20
> > > #4  0x00002ad35adce8d8 in ompi_request_wait_completion () from 
> > > /cluster/mpi/openmpi/2.0.1/intel2016/lib/libmpi.so.20
> > > #5  0x00002ad35adce838 in mca_pml_cm_recv () from 
> > > /cluster/mpi/openmpi/2.0.1/intel2016/lib/libmpi.so.20
> > > #6  0x00002ad35ad4da42 in 
> > > ompi_coll_base_allreduce_intra_recursivedoubling () from 
> > > /cluster/mpi/openmpi/2.0.1/intel2016/lib/libmpi.so.20
> > > #7  0x00002ad35ad52906 in ompi_coll_tuned_allreduce_intra_dec_fixed () 
> > > from /cluster/mpi/openmpi/2.0.1/intel2016/lib/libmpi.so.20
> > > #8  0x00002ad35ad1f0f4 in PMPI_Allreduce () from 
> > > /cluster/mpi/openmpi/2.0.1/intel2016/lib/libmpi.so.20
> > > #9  0x00002ad35aa99c38 in pmpi_allreduce__ () from 
> > > /cluster/mpi/openmpi/2.0.1/intel2016/lib/libmpi_mpifh.so.20
> > > #10 0x000000000045f8c6 in m_sum_i_ ()
> > > #11 0x0000000000e1ce69 in mlwf_mp_mlwf_wannier90_ ()
> > > #12 0x00000000004331ff in vamp () at main.F:2640
> > > #13 0x000000000040ea1e in main ()
> > > #14 0x00002ad35b080b15 in __libc_start_main () from /lib64/libc.so.6
> > > #15 0x000000000040e929 in _start ()
> > >
> > >
> > > --
> > > Dr. rer. nat. Christof Köhler       email: c.koeh...@bccms.uni-bremen.de
> > > Universitaet Bremen/ BCCMS          phone:  +49-(0)421-218-62334
> > > Am Fallturm 1/ TAB/ Raum 3.12       fax: +49-(0)421-218-62770
> > > 28359 Bremen
> > >
> > > PGP: http://www.bccms.uni-bremen.de/cms/people/c_koehler/
> > >
> > > _______________________________________________
> > > users mailing list
> > > users@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> -- 
> Dr. rer. nat. Christof Köhler       email: c.koeh...@bccms.uni-bremen.de
> Universitaet Bremen/ BCCMS          phone:  +49-(0)421-218-62334
> Am Fallturm 1/ TAB/ Raum 3.12       fax: +49-(0)421-218-62770
> 28359 Bremen  
> 
> PGP: http://www.bccms.uni-bremen.de/cms/people/c_koehler/



-- 
Dr. rer. nat. Christof Köhler       email: c.koeh...@bccms.uni-bremen.de
Universitaet Bremen/ BCCMS          phone:  +49-(0)421-218-62334
Am Fallturm 1/ TAB/ Raum 3.12       fax: +49-(0)421-218-62770
28359 Bremen  

PGP: http://www.bccms.uni-bremen.de/cms/people/c_koehler/

signature.asc
Description: Digital signature

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Abort/ Deadlock issue in allreduce

Reply via email to