Hi Waleed

Even before any OFED upgrades, you could try the items
in the list below.
I have OMPI 1.6.5 and 1.8.3 working with an older OFED version,
with those settings.
That is not really OMPI fault, but Infinband/OFED's.

1) Make sure your locked memory is set to unlimited in
/etc/security/limits.conf

For instance:

*               soft    memlock         unlimited
*               hard    memlock         unlimited


2) If you are using a queue system, make sure it sets the
locked memory to unlimited, so that all child processes
(including your mpiexec and mpi executable) will get it.

For instance, in Torque /etc/init.d/pbs_mom
or in /etc/sysconfig/pbs_mom:

# locked memory
ulimit -l unlimited

3) Add the parameters below to
/etc/modprobe.d/mlx4_core.conf

options mlx4_core log_num_mtt=22 log_mtts_per_seg=1

Do this with care, as the settings vary according to the physical RAM.
In addition, the parameters seem to have been deprecated in 3.X kernels, which makes this tricky.

See these FAQs:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-user
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-more
http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem

***
Having said that, a question remains unanswered:
Why is Infiniband such a nightmare?
***

I hope this helps,
Gus Correa

On 12/30/2014 09:16 AM, Waleed Lotfy wrote:
Thank Devendar for your response.

I'll test it on a new installation with OFED 2.3.2 and OMPI v1.6.5. If it 
didn't work I'll give 1.8.4 a try.

Thank you for your help and I'll get back to you with hopefully good results.

Waleed Lotfy
Bibliotheca Alexandrina
________________________________
From: users [users-boun...@open-mpi.org] on behalf of Deva 
[devendar.bure...@gmail.com]
Sent: Monday, December 29, 2014 8:29 PM
To: Open MPI Users
Subject: Re: [OMPI users] Icreasing OFED registerable memory

Hi Waleed,

It is highly recommended to upgrade to latest OFED.  Meanwhile, Can you try 
latest OMPI release (v1.8.4), where this warning is ignored on older OFEDs

-Devendar

On Sun, Dec 28, 2014 at 6:03 AM, Waleed Lotfy 
<waleed.lo...@bibalex.org<mailto:waleed.lo...@bibalex.org>> wrote:
I have a bunch of 8 GB memory nodes in a cluster who were lately
upgraded to 16 GB. When I run any jobs I get the following warning:
--------------------------------------------------------------------------
WARNING: It appears that your OpenFabrics subsystem is configured to
only
allow registering part of your physical memory.  This can cause MPI jobs
to
run with erratic performance, hang, and/or crash.

This may be caused by your OpenFabrics vendor limiting the amount of
physical memory that can be registered.  You should investigate the
relevant Linux kernel module parameters that control how much physical
memory can be registered, and increase them to allow registering all
physical memory on your machine.

See this Open MPI FAQ item for more information on these Linux kernel
module
parameters:

     http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

   Local host:              comp022.local
   Registerable memory:     8192 MiB
   Total memory:            16036 MiB

Your MPI job will continue, but may be behave poorly and/or hang.
--------------------------------------------------------------------------

Searching for a fix to this issue, I found that I have to set
log_num_mtt within the kernel module, so I added this line to
modprobe.conf:

options mlx4_core log_num_mtt=21

But then ib0 interface fails to start showing this error:
ib_ipoib device ib0 does not seem to be present, delaying
initialization.

Reducing the value of log_num_mtt to 20, allows ib0 to start but shows
the registerable memory of 8 GB warning.

I am using OFED 1.3.1, I know it is pretty old and we are planning to
upgrade soon.

Output on all nodes for 'ompi_info  -v ompi full --parsable':

ompi:version:full:1.2.7
ompi:version:svn:r19401
orte:version:full:1.2.7
orte:version:svn:r19401
opal:version:full:1.2.7
opal:version:svn:r19401

Any help would be appreciated.

Waleed Lotfy
Bibliotheca Alexandrina
_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/12/26076.php



--


-Devendar
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/12/26088.php


Reply via email to