> On Jun 20, 2019, at 9:40 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> 
> On Jun 20, 2019, at 9:31 AM, Noam Bernstein via users 
> <users@lists.open-mpi.org> wrote:
>> 
>> One thing that I’m wondering if anyone familiar with the internals can 
>> explain is how you get a memory leak that isn’t freed when then program 
>> ends?  Doesn’t that suggest that it’s something lower level, like maybe a 
>> kernel issue?
> 
> If "top" doesn't show processes eating up the memory, and killing processes 
> (e.g., MPI processes) doesn't give you memory back, then it's likely that 
> something in the kernel is leaking memory.

That’s definitely what’s happening.  “free" is reporting a lot of memory used, 
but adding the values from ps is much lower.

> 
> Have you tried the latest version of UCX -- including their kernel drivers -- 
> from Mellanox (vs. inbox/CentOS)?
> 

I’ve tried the latest ucx from the ucx web site, 1.5.1, which doesn’t change 
the behavior.

I haven’t yet tried the latest OFED or Mellanox low level stuff.  That’s next 
on my list, but slightly more involved to do, so I’ve been avoiding it.

                                                                thanks,
                                                                Noam
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to