Are you binding the procs? We don't bind by default (this will change in 1.7.4), and binding can play a significant role when comparing across kernels.
add "--bind-to-core" to your cmd line On Dec 17, 2013, at 7:09 AM, Noam Bernstein <noam.bernst...@nrl.navy.mil> wrote: > On Dec 16, 2013, at 5:40 PM, Noam Bernstein <noam.bernst...@nrl.navy.mil> > wrote: > >> >> Once I have some more detailed information I'll follow up. > > OK - I've tried to characterize the behavior with vasp, which accounts for > most of our cluster usage, and it's quite odd. I ran my favorite benchmarking > job repeated 4 times. As you can see below, in some > cases using sm it's as fast as before (kernel 2.6.32-358.23.2.el6.x86_64), > but mostly it's a factor of 2 slower. With openib and our older nodes it's > always a > factor of 2-4 slower. With the newer nodes in a situation where using sm is > possible it's occasionally as fast as before, but sometimes it's 10-20 times > slower. When using ib with the new nodes it's always much slower than before. > > openmpi is 1.7.3, recompiled with the new kernel. vasp is 5.3.3, which we've > been using for months. Everything is compiled with an older stable version > of the intel compiler, as we've been doing for a long time. > > More perhaps useful information - I don't have actual data from the previous > setup (perhaps I should roll back some nodes and check), but I generally > expect to see 100% cpu usage on all the processes, either because they're > doing numeric stuff, or doing a busy-wait for mpi. However, now I see a few > of the vasp processes at 100%, and the others at 50-70% (say 4-6 on a given > node at 100%, and the rest lower). > > If anyone has any ideas on what's going on, or how to debug further, I'd > really appreciate some suggestions. > > > Noam > > 8 core nodes (dual Xeon X5550) > > 8 MPI procs (single node) > used to be 5.74 s > now: > btl: default or sm only or sm+openib: 5.5-9.3 s, mostly the larger times > btl: openib: 10.0-12.2 s > > 16 MPI procs (2 nodes) > used to be 2.88 s > btl default or openib or sm+openib: 4.8 - 6.23 s > > 32 MPI procs (4 nodes) > use to be 1.59 s > btl default or openib or sm+openib: 2.73-4.49 s, but sometimes just fails > > at least once gave the errors (stack trace is incomplete, but probably on > mpi_comm_rank, mpi_comm_size, or mpi_barrier) > [compute-3-24:32566] [[59587,0],0]:route_callback trying to get message from > [[59587,1],20] to [[59587,1],28]:102, routing loop > [0] > func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/libopen-pal.so.6(opal_backtrace_print+0x1f) > [0x2b5940c2dd9f] > [1] > func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_rml_oob.so(+0x22b6) > [0x2b5941f0f2b6] > [2] > func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_msg_recv_complete+0x27f) > [0x2b594333341f] > [3] > func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_oob_tcp.so(+0x9d3a) > [0x2b5943334d3a] > [4] > func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x8bc) > [0x2b5940c3592c] > [5] func:mpirun(orterun+0xe25) [0x404565] > [6] func:mpirun(main+0x20) [0x403594] > [7] func:/lib64/libc.so.6(__libc_start_main+0xfd) [0x3091c1ed1d] > [8] func:mpirun() [0x4034b9] > > > 16 core nodes (dual Xeon E5-2670) > > 8 MPI procs (single node) > not sure what it used to be, but 3.3 s is plausible > btl: default or sm or openib+sm: 3.3-3.4 s > btl: openib 3.9-4.14 s > > 16 MPI procs (single node) > used to be 2.07 s > btl default or openib: 23.0-32.56 s > btl sm or sm+openib: 1.94 s - 39.27 s (mostly the slower times) > > 32 MPI procs (2 nodes) > used to be 1.24 s > btl default or sm or openib or sm+openib: 30s - 97 > s_______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users