On Dec 16, 2013, at 5:40 PM, Noam Bernstein <noam.bernst...@nrl.navy.mil> wrote:
> > Once I have some more detailed information I'll follow up. OK - I've tried to characterize the behavior with vasp, which accounts for most of our cluster usage, and it's quite odd. I ran my favorite benchmarking job repeated 4 times. As you can see below, in some cases using sm it's as fast as before (kernel 2.6.32-358.23.2.el6.x86_64), but mostly it's a factor of 2 slower. With openib and our older nodes it's always a factor of 2-4 slower. With the newer nodes in a situation where using sm is possible it's occasionally as fast as before, but sometimes it's 10-20 times slower. When using ib with the new nodes it's always much slower than before. openmpi is 1.7.3, recompiled with the new kernel. vasp is 5.3.3, which we've been using for months. Everything is compiled with an older stable version of the intel compiler, as we've been doing for a long time. More perhaps useful information - I don't have actual data from the previous setup (perhaps I should roll back some nodes and check), but I generally expect to see 100% cpu usage on all the processes, either because they're doing numeric stuff, or doing a busy-wait for mpi. However, now I see a few of the vasp processes at 100%, and the others at 50-70% (say 4-6 on a given node at 100%, and the rest lower). If anyone has any ideas on what's going on, or how to debug further, I'd really appreciate some suggestions. Noam 8 core nodes (dual Xeon X5550) 8 MPI procs (single node) used to be 5.74 s now: btl: default or sm only or sm+openib: 5.5-9.3 s, mostly the larger times btl: openib: 10.0-12.2 s 16 MPI procs (2 nodes) used to be 2.88 s btl default or openib or sm+openib: 4.8 - 6.23 s 32 MPI procs (4 nodes) use to be 1.59 s btl default or openib or sm+openib: 2.73-4.49 s, but sometimes just fails at least once gave the errors (stack trace is incomplete, but probably on mpi_comm_rank, mpi_comm_size, or mpi_barrier) [compute-3-24:32566] [[59587,0],0]:route_callback trying to get message from [[59587,1],20] to [[59587,1],28]:102, routing loop [0] func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/libopen-pal.so.6(opal_backtrace_print+0x1f) [0x2b5940c2dd9f] [1] func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_rml_oob.so(+0x22b6) [0x2b5941f0f2b6] [2] func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_msg_recv_complete+0x27f) [0x2b594333341f] [3] func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_oob_tcp.so(+0x9d3a) [0x2b5943334d3a] [4] func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x8bc) [0x2b5940c3592c] [5] func:mpirun(orterun+0xe25) [0x404565] [6] func:mpirun(main+0x20) [0x403594] [7] func:/lib64/libc.so.6(__libc_start_main+0xfd) [0x3091c1ed1d] [8] func:mpirun() [0x4034b9] 16 core nodes (dual Xeon E5-2670) 8 MPI procs (single node) not sure what it used to be, but 3.3 s is plausible btl: default or sm or openib+sm: 3.3-3.4 s btl: openib 3.9-4.14 s 16 MPI procs (single node) used to be 2.07 s btl default or openib: 23.0-32.56 s btl sm or sm+openib: 1.94 s - 39.27 s (mostly the slower times) 32 MPI procs (2 nodes) used to be 1.24 s btl default or sm or openib or sm+openib: 30s - 97 s
smime.p7s
Description: S/MIME cryptographic signature