Re: [OMPI users] growing memory use from MPI application

2019-06-19 Thread Noam Bernstein via users
> On Jun 19, 2019, at 5:05 PM, Joshua Ladd wrote: > > Hi, Noam > > Can you try your original command line with the following addition: > > mpirun —mca pml ucx —mca btl ^vader,tcp,openib -mca osc ucx > > I think we're seeing some conflict between UCX PML and UCT OSC. I did this, although me

Re: [OMPI users] growing memory use from MPI application

2019-06-19 Thread Joshua Ladd via users
Hi, Noam Can you try your original command line with the following addition: mpirun —mca pml ucx —mca btl ^vader,tcp,openib -*mca osc ucx * I think we're seeing some conflict between UCX PML and UCT OSC. Josh On Wed, Jun 19, 2019 at 4:36 PM Noam Bernstein via users < users@lists.open-mpi.org>

Re: [OMPI users] growing memory use from MPI application

2019-06-19 Thread Noam Bernstein via users
> On Jun 19, 2019, at 2:44 PM, George Bosilca wrote: > > To completely disable UCX you need to disable the UCX MTL and not only the > BTL. I would use "--mca pml ob1 --mca btl ^ucx —mca btl_openib_allow_ib 1”. Thanks for the pointer. Disabling ucx this way _does_ seem to fix the memory issue.

Re: [OMPI users] growing memory use from MPI application

2019-06-19 Thread George Bosilca via users
To completely disable UCX you need to disable the UCX MTL and not only the BTL. I would use "--mca pml ob1 --mca btl ^ucx —mca btl_openib_allow_ib 1". As you have a gdb session on the processes you can try to break on some of the memory allocations function (malloc, realloc, calloc). George.

Re: [OMPI users] growing memory use from MPI application

2019-06-19 Thread Noam Bernstein via users
I tried to disable ucx (successfully, I think - I replaced the “—mca btl ucx —mca btl ^vader,tcp,openib” with “—mca btl_openib_allow_ib 1”, and attaching gdb to a running process shows no ucx-related routines active). It still has the same fast growing (1 GB/s) memory usage problem.

Re: [OMPI users] growing memory use from MPI application

2019-06-19 Thread Noam Bernstein via users
> On Jun 19, 2019, at 2:00 PM, John Hearns via users > wrote: > > Noam, it may be a stupid question. Could you try runningslabtop ss the > program executes The top SIZE usage is this line OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 5937540 5937540 100%

Re: [OMPI users] growing memory use from MPI application

2019-06-19 Thread John Hearns via users
Noam, it may be a stupid question. Could you try runningslabtop ss the program executes Also 'watch cat /proc/meminfo'is also a good diagnostic On Wed, 19 Jun 2019 at 18:32, Noam Bernstein via users < users@lists.open-mpi.org> wrote: > Hi - we’re having a weird problem with OpenMPI on ou

[OMPI users] growing memory use from MPI application

2019-06-19 Thread Noam Bernstein via users
Hi - we’re having a weird problem with OpenMPI on our newish infiniband EDR (mlx5) nodes. We're running CentOS 7.6, with all the infiniband and ucx libraries as provided by CentOS, i.e. ucx-1.4.0-1.el7.x86_64 libibverbs-utils-17.2-3.el7.x86_64 libibverbs-17.2-3.el7.x86_64 libibumad-17.2-3.el7.x8