Iirc, there used to be a bug in Open MPI leading to such a false positive,
but I cannot remember the details.
I recommend you use at least the latest 1.10 (which is really a 1.8 + a few
more features and several bug fixes)
An other option is to simply +1 a mtt parameter and see if it helps

Cheers,

Gilles

On Sunday, March 26, 2017, Ilchenko Evgeniy <ilchenk...@gmail.com> wrote:

> Hi all!
>
> I install older version openmpi 1.8 and get other error. For command
>
> mpirun -np 1 prog
>
> I get next output:
>
> --------------------------------------------------------------------------
> WARNING: It appears that your OpenFabrics subsystem is configured to only
> allow registering part of your physical memory. This can cause MPI jobs to
> run with erratic performance, hang, and/or crash.
>
> This may be caused by your OpenFabrics vendor limiting the amount of
> physical memory that can be registered. You should investigate the
> relevant Linux kernel module parameters that control how much physical
> memory can be registered, and increase them to allow registering all
> physical memory on your machine.
>
> See this Open MPI FAQ item for more information on these Linux kernel module
> parameters:
> http://www.open-mpi.org/faq/?category=openfabrics#ib-..
>
> Local host: node107
> Registerable memory: 32768 MiB
> Total memory: 65459 MiB
>
> Your MPI job will continue, but may be behave poorly and/or hang.
> --------------------------------------------------------------------------
> hello from 0
> hello from 1
> [node107:48993] 1 more process has sent help message help-mpi- btl-openib.txt 
> / reg mem limit low
> [node107:48993] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
> help / error messages
>
> Other installed soft (Intel MPI library) work fine, without any errors and
> using all 64GB memory.
>
> For OpenMPI I don't use any PBS manager (Torque, slurm, etc.), I work on
> single node. I get to the node by command
>
> ssh node107
>
> For command
>
> cat /etc/security/limits.conf
>
> I get next output:
>
> ...
> * soft rss  2000000
> * soft stack    2000000
> * hard stack    unlimited
> * soft data     unlimited
> * hard data     unlimited
> * soft memlock unlimited
> * hard memlock unlimited
> * soft nproc   10000
> * hard nproc   10000
> * soft nofile   10000
> * hard nofile   10000
> * hard cpu unlimited
> * soft cpu unlimited
> ...
>
> For command
>
> cat /sys/module/mlx4_core/parameters/log_num_mtt
>
> I get output:
>
> 0
>
> Command:
>
> cat /sys/module/mlx4_core/parameters/log_mtts_per_seg
>
> output:
>
> 3
>
> Command:
>
> getconf PAGESIZE
>
> output:
>
> 4096
>
> With this params and by formula
>
> max_reg_mem = (2^log_num_mtt) * (2^log_mtts_per_seg) * PAGE_SIZE
>
> max_reg_mem = 32768 bytes, nor 32GB, how specified in openmpi warning.
>
> I think that the cause of errors for different versions (1.8 and 2.1 ) is
> the same...
>
> What is the reason for this?
>
> What programs or settings may restrict memory for openmpi?
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to