Hi - we’re having a weird problem with OpenMPI on our newish infiniband EDR 
(mlx5) nodes.  We're running CentOS 7.6, with all the infiniband and ucx 
libraries as provided by CentOS, i.e.
kernel is 
I’ve compiled my open OpenMPI, version 4.0.1 (—with-verbs —with-ofi —with-ucx).

The job is started with
mpirun —mca pml ucx —mca btl ^vader,tcp,openib
as recommended for ucx.  

We have some jobs (one particular code, some but not all sets of input 
parameters) that appear to take an increasing amount of memory (in MPI?) until 
the node crashes.  The total memory used by all processes (reported by ps or 
top) is not increasing, but “free” reports less and less available memory.  
Within a couple of minutes it uses all of the 96GB on each of the nodes. When 
the job is killed the processes go away, but the memory usage (as reported by 
“free”) stays the same, e.g.:
              total        used        free      shared  buff/cache   available
Mem:       98423956    88750140     7021688        2184     2652128     6793020
Swap:      65535996      365312    65170684
As far as I can tell I have to reboot to get the memory back.

If I attach to a running process with “gdb -p”, I see stack traces that look 
like these two examples (starting from the first mpi-related call):

#0  0x00002b22a95134a3 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00002b22be73a3e8 in mlx5_poll_cq_v1 () from 
#2  0x00002b22bcb267de in uct_ud_verbs_iface_progress () from /lib64/libuct.so.0
#3  0x00002b22bc8d28b2 in ucp_worker_progress () from /lib64/libucp.so.0
#4  0x00002b22b7cd14e7 in mca_pml_ucx_progress () from 
#5  0x00002b22ab6064fc in opal_progress () from 
#6  0x00002b22a9f51dc5 in ompi_request_default_wait () from 
#7  0x00002b22a9fa355c in ompi_coll_base_allreduce_intra_ring () from 
#8  0x00002b22a9f65cb3 in PMPI_Allreduce () from 
#9  0x00002b22a9cedf9b in pmpi_allreduce__ () from 

#0  0x00002ae0518de69d in write () from /lib64/libpthread.so.0
#1  0x00002ae064458d7f in ibv_cmd_reg_mr () from /usr/lib64/libibverbs.so.1
#2  0x00002ae066b9221b in mlx5_reg_mr () from 
#3  0x00002ae064461f08 in ibv_reg_mr () from /usr/lib64/libibverbs.so.1
#4  0x00002ae064f6e312 in uct_ib_md_reg_mr.isra.11.constprop () from 
#5  0x00002ae064f6e4f2 in uct_ib_rcache_mem_reg_cb () from /lib64/libuct.so.0
#6  0x00002ae0651aec0f in ucs_rcache_get () from /lib64/libucs.so.0
#7  0x00002ae064f6d6a4 in uct_ib_mem_rcache_reg () from /lib64/libuct.so.0
#8  0x00002ae064d1fa58 in ucp_mem_rereg_mds () from /lib64/libucp.so.0
#9  0x00002ae064d21438 in ucp_request_memory_reg () from /lib64/libucp.so.0
#10 0x00002ae064d21663 in ucp_request_send_start () from /lib64/libucp.so.0
#11 0x00002ae064d335dd in ucp_tag_send_nb () from /lib64/libucp.so.0
#12 0x00002ae06420a5e6 in mca_pml_ucx_start () from 
#13 0x00002ae05236fc06 in ompi_coll_base_alltoall_intra_basic_linear () from 
#14 0x00002ae05232f347 in PMPI_Alltoall () from 
#15 0x00002ae0520b704c in pmpi_alltoall__ () from 

This doesn’t seem to happen on our older nodes (which have FDR mlx4 

I don’t really have a mental model for OpenMPI's memory use, so I don’t know 
what component I should investigate: OpenMPI itself? ucx?  OFED? Something 
else?  IF anyone has any suggestions for what to try, and/or what other 
information would be useful, I’d appreciate it.


users mailing list

Reply via email to