On Dec 16, 2013, at 5:40 PM, Noam Bernstein <noam.bernst...@nrl.navy.mil> wrote:

> 
> Once I have some more detailed information I'll follow up.

OK - I've tried to characterize the behavior with vasp, which accounts for
most of our cluster usage, and it's quite odd.  I ran my favorite benchmarking
job repeated 4 times. As you can see below, in some
cases using sm it's as fast as before (kernel 2.6.32-358.23.2.el6.x86_64),
but mostly it's a factor of 2 slower.  With openib and our older nodes it's 
always a 
factor of 2-4 slower.  With the newer nodes in a situation where using sm is
possible it's occasionally as fast as before, but sometimes it's 10-20 times
slower.  When using ib with the new nodes it's always much slower than before.

openmpi is 1.7.3, recompiled with the new kernel.  vasp is 5.3.3, which we've
been using for months.  Everything is compiled with an older stable version
of the intel compiler, as we've been doing for a long time.

More perhaps useful information - I don't have actual data from the previous
setup (perhaps I should roll back some nodes and check), but I generally
expect to see 100% cpu usage on all the processes, either because they're
doing numeric stuff, or doing a busy-wait for mpi.  However, now I see a few 
of the vasp processes at 100%, and the others at 50-70% (say 4-6 on a given
node at 100%, and the rest lower). 

If anyone has any ideas on what's going on, or how to debug further, I'd
really appreciate some suggestions.

                                                                                
                Noam

8 core nodes (dual Xeon X5550)

8 MPI procs (single node)
used to be 5.74 s
now:
btl: default  or sm only or sm+openib: 5.5-9.3 s, mostly the larger times
btl: openib: 10.0-12.2 s

16 MPI procs (2 nodes)
used to be 2.88 s
btl default or openib or sm+openib: 4.8 - 6.23 s

32 MPI procs (4 nodes)
use to be 1.59 s
btl default or openib or sm+openib: 2.73-4.49 s, but sometimes just fails

at least once gave the errors (stack trace is incomplete, but probably on 
mpi_comm_rank, mpi_comm_size, or mpi_barrier)
[compute-3-24:32566] [[59587,0],0]:route_callback trying to get message from 
[[59587,1],20] to [[59587,1],28]:102, routing loop
[0] 
func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/libopen-pal.so.6(opal_backtrace_print+0x1f)
 [0x2b5940c2dd9f]
[1] 
func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_rml_oob.so(+0x22b6) 
[0x2b5941f0f2b6]
[2] 
func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_msg_recv_complete+0x27f)
 [0x2b594333341f]
[3] 
func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_oob_tcp.so(+0x9d3a) 
[0x2b5943334d3a]
[4] 
func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x8bc)
 [0x2b5940c3592c]
[5] func:mpirun(orterun+0xe25) [0x404565]
[6] func:mpirun(main+0x20) [0x403594]
[7] func:/lib64/libc.so.6(__libc_start_main+0xfd) [0x3091c1ed1d]
[8] func:mpirun() [0x4034b9]


16 core nodes (dual Xeon E5-2670)

8 MPI procs (single node)
not sure what it used to be, but 3.3 s is plausible
btl: default or sm or openib+sm: 3.3-3.4 s
btl: openib 3.9-4.14 s

16 MPI procs (single node)
used to be 2.07 s
btl default or openib: 23.0-32.56 s
btl sm or sm+openib: 1.94 s - 39.27 s (mostly the slower times)

32 MPI procs (2 nodes)
used to be 1.24 s
btl default or sm or openib or sm+openib: 30s - 97 s

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to