Hi Jeff, On Wed, Jun 20, 2012 at 04:16:12PM -0400, Jeff Squyres wrote: > On Jun 20, 2012, at 3:36 PM, Martin Siegert wrote: > > > by now we know of three programs - dirac, wrf, quantum espresso - that > > all hang with openmpi-1.4.x (have not yet checked with openmpi-1.6). > > All of these programs run to completion with the mpiexec commandline > > argument: --mca btl_openib_flags 305 > > We now set this in the global configuration file openmpi-mca-params.conf. > > What is the reason that this is not the default in the first place? > > Are there any negative effects? > > Two things: > > 1. These flags -- 305 (or 0x131 or 0001 0011 0001) translate to telling the > openib BTL the following: > > - 1: SEND: meaning that the openib BTL is using send/receive semantics > - 16: ACK: meaningless with the ob1 PML > - 32: CHECKSUM: meaningless with the ob1 PML > - 256: meaningless > > What's meaning here is what is missing: RDMA PUT and GET. So all RDMA > support is disabled. > > This will work fine, but you may want to increase your > mca_btl_openib_eager_limit size (e.g., U. Michigan did the same thing as you > -- disabled RDMA -- but increased the eager limit to 64k to get back some of > the lost performance). > > 2. We believe that we have *finally* (just recently) fixed this issue in the > SVN trunk and upcoming 1.6.1 release. I have a test pre-release 1.6.1 > tarball -- would you mind giving it a whirl? > > http://www.open-mpi.org/~jsquyres/unofficial/openmpi-1.6.1ticket3131r26612M.tar.bz2
Thanks! I tried this and, indeed, the program (I tested quantum espresso, pw.x, so far) no longer hangs. Then I went one step further and benchmarked the following three cases: 1) pw.x compiled with openmpi-1.3.3 2) pw.x compiled with openmpi-1.4.3 and btl_openib_flags = 305 btl_openib_eager_limit = 65536 in etc/openmpi-mca-params.conf 3) pw.x compiled with openmpi-1.6.1ticket3131r26612M These are the results time (in seconds) per iteration - smaller is better: 1) 33.11 2) 28.23 3) 34.81 That's rather disappointing, isn't it? Cheers, Martin