(cross-post to 'users' and 'devel' mailing lists) Dear Open MPI developer, a long time ago, I reported about an error in Open MPI: http://www.open-mpi.org/community/lists/users/2012/02/18565.php
Well, in the 1.6 the behaviour has changed: the test case don't hang forever and block an InfiniBand interface, but seem to run through, and now this error message is printed:
-------------------------------------------------------------------------- The OpenFabrics (openib) BTL failed to register memory in the driver. Please check /var/log/messages or dmesg for driver specific failure reason. The failure occured here: Local host: mlx4_0 Device: openib_reg_mr Function: Cannot allocate memory() Errno says: You may need to consult with your system administrator to get this problem fixed. -------------------------------------------------------------------------- Looking into FAQ http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages deliver us no hint about what is bad. The locked memory is unlimited: -------------------------------------------------------------------------- pk224850@linuxbdc02:~[502]$ cat /etc/security/limits.conf | grep memlock # - memlock - max locked-in-memory address space (KB) * hard memlock unlimited * soft memlock unlimited -------------------------------------------------------------------------- Could it still be an Open MPI issue? Are you interested in reproduce this? Best, Paul KapinosP.S: The same test with Intel MPI cannot run using DAPL, but run very fine opef 'ofa' (= native verbs as Open MPI use it). So I believe the problem is rooted in the communication pattern of the program; it send very LARGE messages to a lot of/all other processes. (The program perform an matrix transposition of a distributed matrix).
-- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915
smime.p7s
Description: S/MIME Cryptographic Signature