Hi, What happens when you don't run with per-peer queue pairs? Try:
-mca btl_openib_receive_queues S,4096,128:S,12288,128:S,65536,128 -- Samuel K. Gutierrez Los Alamos National Laborator On Jun 23, 2011, at 7:56 AM, Mathieu Gontier wrote: > Hello, > > Thank for the answer. > I am testing with OpenMPI-1.4.3: my computation is queuing. But I did not > read anything obvious related to my issue. Have you read something which > could solve it? > I am going to submit my computation with --mca mpi_leave_pinned 0, but do you > have any idea how it affect the performance? Compared to using Ethernet? > > Many thanks for your support. > > On 06/23/2011 03:01 PM, Josh Hursey wrote: >> >> I wonder if this is related to memory pinning. Can you try turning off >> the leave pinned, and see if the problem persists (this may affect >> performance, but should avoid the crash): >> mpirun ... --mca mpi_leave_pinned 0 ... >> >> Also it looks like Smoky has a slightly newer version of the 1.4 >> branch that you should try to switch to if you can. The following >> command will show you all of the available installs on that machine: >> shell$ module avail ompi >> >> For a list of supported compilers for that version try the 'show' option: >> shell$ module show ompi/1.4.3 >> ------------------------------------------------------------------- >> /sw/smoky/modulefiles-centos/ompi/1.4.3: >> >> module-whatis This module configures your environment to make Open >> MPI 1.4.3 available. >> Supported Compilers: >> pathscale/3.2.99 >> pathscale/3.2 >> pgi/10.9 >> pgi/10.4 >> intel/11.1.072 >> gcc/4.4.4 >> gcc/4.4.3 >> ------------------------------------------------------------------- >> >> Let me know if that helps. >> >> Josh >> >> >> On Wed, Jun 22, 2011 at 4:16 AM, Mathieu Gontier >> <mathieu.gont...@gmail.com> wrote: >>> Dear all, >>> >>> First of all, all my apologies because I post this message to both the bug >>> and user mailing list. But for the moment, I do not know if it is a bug! >>> >>> I am running a CFD structured flow solver at ORNL, and I have an access to a >>> small cluster (Smoky) using OpenMPI-1.4.2 with Infiniband by default. >>> Recently we increased the size of our models, and since that time we have >>> run into many infiniband related problems. The most serious problem is a >>> hard crash with the following error message: >>> >>> [smoky45][[60998,1],32][/sw/sources/ompi/1.4.2/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one] >>> error creating qp errno says Cannot allocate memory >>> >>> If we force the solver to use ethernet (mpirun -mca btl ^openib) the >>> computations works correctly, although very slowly (a single iteration take >>> ages). Do you have any idea what could be causing these problems? >>> >>> If it is due to a bug or a limitation into OpenMPI, do you think the version >>> 1.4.3, the coming 1.4.4 or any 1.5 version could solve the problem? I read >>> the release notes, but I did not read any obvious patch which could fix my >>> problem. The system administrator is ready to compile a new package for us, >>> but I do not want to ask to install to many of them. >>> >>> Thanks. >>> -- >>> >>> Mathieu Gontier >>> skype: mathieu_gontier >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> > > -- > > Mathieu Gontier > skype: mathieu_gontier > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users