Michael,

Could you try to run this again with "--mca mpi_leave_pinned 0" parameter?
I suspect that this might be due to a message size problem - MPI
tries to do RDMA with a message bigger than what HCA supports.

-- YK

On 11-Apr-11 7:44 PM, Michael Di Domenico wrote:
> Here's a chunk of code that reproduces the error everytime on my cluster
> 
> If you call it with $((2**24)) as a parameter it should run fine, change it 
> to $((2**27)) and it will stall
> 
> On Tue, Apr 5, 2011 at 11:24 AM, Terry Dontje <terry.don...@oracle.com 
> <mailto:terry.don...@oracle.com>> wrote:
> 
>     It was asked during the community concall whether the below may be 
> related to ticket #2722 https://svn.open-mpi.org/trac/ompi/ticket/2722?
> 
>     --td
> 
>     On 04/04/2011 10:17 PM, David Zhang wrote:
>>     Any error messages?  Maybe the nodes ran out of memory?  I know MPI 
>> implement some kind of buffering under the hood, so even though you're 
>> sending array's over 2^26 in size, it may require more than that for MPI to 
>> actually send it.
>>
>>     On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico 
>> <mdidomeni...@gmail.com <mailto:mdidomeni...@gmail.com>> wrote:
>>
>>         Has anyone seen an issue where OpenMPI/Infiniband hangs when sending
>>         messages over 2^26 in size?
>>
>>         For a reason i have not determined just yet machines on my cluster
>>         (OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send
>>         array's over 2^26 in size via the AllToAll collective. (user code)
>>
>>         Further testing seems to indicate that an MPI message over 2^26 fails
>>         (tested with IMB-MPI)
>>
>>         Running the same test on a different older IB connected cluster seems
>>         to work, which would seem to indicate a problem with the infiniband
>>         drivers of some sort rather then openmpi (but i'm not sure).
>>
>>         Any thoughts, directions, or tests?
>>         _______________________________________________
>>         users mailing list
>>         us...@open-mpi.org <mailto:us...@open-mpi.org>
>>         http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>>
>>     -- 
>>     David Zhang
>>     University of California, San Diego
>>
>>
>>     _______________________________________________
>>     users mailing list
>>     us...@open-mpi.org  <mailto:us...@open-mpi.org>
>>     http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
>     -- 
>     Oracle
>     Terry D. Dontje | Principal Software Engineer
>     Developer Tools Engineering | +1.781.442.2631
>     Oracle *- Performance Technologies*
>     95 Network Drive, Burlington, MA 01803
>     Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>
> 
> 
> 
> 
>     _______________________________________________
>     users mailing list
>     us...@open-mpi.org <mailto:us...@open-mpi.org>
>     http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to