Michael, Could you try to run this again with "--mca mpi_leave_pinned 0" parameter? I suspect that this might be due to a message size problem - MPI tries to do RDMA with a message bigger than what HCA supports.
-- YK On 11-Apr-11 7:44 PM, Michael Di Domenico wrote: > Here's a chunk of code that reproduces the error everytime on my cluster > > If you call it with $((2**24)) as a parameter it should run fine, change it > to $((2**27)) and it will stall > > On Tue, Apr 5, 2011 at 11:24 AM, Terry Dontje <terry.don...@oracle.com > <mailto:terry.don...@oracle.com>> wrote: > > It was asked during the community concall whether the below may be > related to ticket #2722 https://svn.open-mpi.org/trac/ompi/ticket/2722? > > --td > > On 04/04/2011 10:17 PM, David Zhang wrote: >> Any error messages? Maybe the nodes ran out of memory? I know MPI >> implement some kind of buffering under the hood, so even though you're >> sending array's over 2^26 in size, it may require more than that for MPI to >> actually send it. >> >> On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico >> <mdidomeni...@gmail.com <mailto:mdidomeni...@gmail.com>> wrote: >> >> Has anyone seen an issue where OpenMPI/Infiniband hangs when sending >> messages over 2^26 in size? >> >> For a reason i have not determined just yet machines on my cluster >> (OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send >> array's over 2^26 in size via the AllToAll collective. (user code) >> >> Further testing seems to indicate that an MPI message over 2^26 fails >> (tested with IMB-MPI) >> >> Running the same test on a different older IB connected cluster seems >> to work, which would seem to indicate a problem with the infiniband >> drivers of some sort rather then openmpi (but i'm not sure). >> >> Any thoughts, directions, or tests? >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> >> -- >> David Zhang >> University of California, San Diego >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Oracle > Terry D. Dontje | Principal Software Engineer > Developer Tools Engineering | +1.781.442.2631 > Oracle *- Performance Technologies* > 95 Network Drive, Burlington, MA 01803 > Email terry.don...@oracle.com <mailto:terry.don...@oracle.com> > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users