> On Jun 20, 2019, at 9:40 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > On Jun 20, 2019, at 9:31 AM, Noam Bernstein via users > <users@lists.open-mpi.org> wrote: >> >> One thing that I’m wondering if anyone familiar with the internals can >> explain is how you get a memory leak that isn’t freed when then program >> ends? Doesn’t that suggest that it’s something lower level, like maybe a >> kernel issue? > > If "top" doesn't show processes eating up the memory, and killing processes > (e.g., MPI processes) doesn't give you memory back, then it's likely that > something in the kernel is leaking memory.
That’s definitely what’s happening. “free" is reporting a lot of memory used, but adding the values from ps is much lower. > > Have you tried the latest version of UCX -- including their kernel drivers -- > from Mellanox (vs. inbox/CentOS)? > I’ve tried the latest ucx from the ucx web site, 1.5.1, which doesn’t change the behavior. I haven’t yet tried the latest OFED or Mellanox low level stuff. That’s next on my list, but slightly more involved to do, so I’ve been avoiding it. thanks, Noam _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users