Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Charles A Taylor via users
This looks a lot like a problem I had with OpenMPI 3.1.2. I thought the fix was landed in 4.0.0 but you might want to check the code to be sure there wasn’t a regression in 4.1.x. Most of our codes are still running 3.1.2 so I haven’t built anything beyond 4.0.0 which definitely included the

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Noam Bernstein via users
> On Jun 20, 2019, at 4:44 AM, Charles A Taylor wrote: > > This looks a lot like a problem I had with OpenMPI 3.1.2. I thought the fix > was landed in 4.0.0 but you might > want to check the code to be sure there wasn’t a regression in 4.1.x. Most > of our codes are still running > 3.1.2

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Yann Jobic via users
Hi, Le 6/20/2019 à 3:31 PM, Noam Bernstein via users a écrit : On Jun 20, 2019, at 4:44 AM, Charles A Taylor > wrote: This looks a lot like a problem I had with OpenMPI 3.1.2.  I thought the fix was landed in 4.0.0 but you might want to check the code to be sure

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread John Hearns via users
Errr.. you chave dropped caches? echo 3 > /proc/sys/vm/drop_caches On Thu, 20 Jun 2019 at 15:59, Yann Jobic via users wrote: > Hi, > > Le 6/20/2019 à 3:31 PM, Noam Bernstein via users a écrit : > > > > > >> On Jun 20, 2019, at 4:44 AM, Charles A Taylor >> > wrote:

Re: [OMPI users] OpenMPI 4 and pmi2 support

2019-06-20 Thread Noam Bernstein via users
> On Jun 20, 2019, at 12:25 PM, Carlson, Timothy S > wrote: > > As of recent you needed to use --with-slurm and --with-pmi2 > > While the configure line indicates it picks up pmi2 as part of slurm that is > not in fact true and you need to specifically tell it about pmi2 When I do

Re: [OMPI users] OpenMPI 4 and pmi2 support

2019-06-20 Thread Noam Bernstein via users
> On Jun 20, 2019, at 12:55 PM, Carlson, Timothy S > wrote: > > Just pass /usr to configure instead of /usr/include/slurm This seems to have done it (as did passing CPPFLAGS, but this feels cleaner). Thank you all for the suggestions.

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Joseph Schuchart via users
Noam, Another idea: check for stale files in /dev/shm/ (or a subdirectory that looks like it belongs to UCX/OpenMPI) and SysV shared memory using `ipcs -m`. Joseph On 6/20/19 3:31 PM, Noam Bernstein via users wrote: On Jun 20, 2019, at 4:44 AM, Charles A Taylor >

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Jeff Squyres (jsquyres) via users
On Jun 20, 2019, at 1:34 PM, Noam Bernstein wrote: > > Aha - using Mellanox’s OFED packaging seems to essentially (if not 100%) > fixed the issue. There still appears to be some small leak, but it’s of > order 1 GB, not 10s of GB, and it doesn’t grow continuously. And on later > runs of

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Nathan Hjelm via users
THAT is a good idea. When using Omnipath we see an issue with stale files in /dev/shm if the application exits abnormally. I don't know if UCX uses that space as well. -Nathan On June 20, 2019 at 11:05 AM, Joseph Schuchart via users wrote: Noam, Another idea: check for stale files in

Re: [OMPI users] Intel Compilers

2019-06-20 Thread Charles A Taylor via users
> On Jun 20, 2019, at 12:10 PM, Carlson, Timothy S > wrote: > > I’ve never seen that error and have built some flavor of this combination > dozens of times. What version of Intel Compiler and what version of OpenMPI > are you trying to build? [chasman@login4 gizmo-mufasa]$ ifort -V

Re: [OMPI users] Intel Compilers

2019-06-20 Thread Jeff Squyres (jsquyres) via users
Can you send the exact ./configure line you are using to configure Open MPI? > On Jun 20, 2019, at 12:32 PM, Charles A Taylor via users > wrote: > > > >> On Jun 20, 2019, at 12:10 PM, Carlson, Timothy S >> wrote: >> >> I’ve never seen that error and have built some flavor of this

Re: [OMPI users] OpenMPI 4 and pmi2 support

2019-06-20 Thread Jeff Squyres (jsquyres) via users
Ok. Perhaps we still missed something in the configury. Worst case, you can: $ ./configure CPPFLAGS=-I/usr/include/slurm ...rest of your configure params... That will add the -I to CPPFLAGS, and it will preserve that you set that value in the top few lines of config.log. On Jun 20, 2019,

Re: [OMPI users] OpenMPI 4 and pmi2 support

2019-06-20 Thread Charles A Taylor via users
Sure… + ./configure --build=x86_64-redhat-linux-gnu \ --host=x86_64-redhat-linux-gnu \ --program-prefix= \ --disable-dependency-tracking \ --prefix=/apps/mpi/intel/2019.1.144/openmpi/4.0.1 \ --exec-prefix=/apps/mpi/intel/2019.1.144/openmpi/4.0.1 \

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Noam Bernstein via users
> On Jun 20, 2019, at 10:42 AM, Noam Bernstein via users > wrote: > > I haven’t yet tried the latest OFED or Mellanox low level stuff. That’s next > on my list, but slightly more involved to do, so I’ve been avoiding it. > Aha - using Mellanox’s OFED packaging seems to essentially (if not

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Noam Bernstein via users
> On Jun 20, 2019, at 1:38 PM, Nathan Hjelm via users > wrote: > > THAT is a good idea. When using Omnipath we see an issue with stale files in > /dev/shm if the application exits abnormally. I don't know if UCX uses that > space as well. No stale shm files. echo 3 >

Re: [OMPI users] OpenMPI 4 and pmi2 support

2019-06-20 Thread Jeff Squyres (jsquyres) via users
On Jun 14, 2019, at 2:02 PM, Noam Bernstein via users wrote: > > Hi Jeff - do you remember this issue from a couple of months ago? Noam: I'm sorry, I totally missed this email. My INBOX is a continual disaster. :-( > Unfortunately, the failure to find pmi.h is still happening. I just

[OMPI users] Intel Compilers

2019-06-20 Thread Charles A Taylor via users
OpenMPI probably has one of the largest and most complete configure+build systems I’ve ever seen. I’m surprised however that it doesn’t pick up the use of the intel compilers and modify the command line parameters as needed. ifort: command line warning #10006: ignoring unknown option '-pipe'

Re: [OMPI users] OpenMPI 4 and pmi2 support

2019-06-20 Thread Noam Bernstein via users
> On Jun 20, 2019, at 11:54 AM, Jeff Squyres (jsquyres) > wrote: > > On Jun 14, 2019, at 2:02 PM, Noam Bernstein via users > wrote: >> >> Hi Jeff - do you remember this issue from a couple of months ago? > > Noam: I'm sorry, I totally missed this email. My INBOX is a continual >

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Jeff Squyres (jsquyres) via users
On Jun 20, 2019, at 9:31 AM, Noam Bernstein via users wrote: > > One thing that I’m wondering if anyone familiar with the internals can > explain is how you get a memory leak that isn’t freed when then program ends? > Doesn’t that suggest that it’s something lower level, like maybe a kernel

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread John Hearns via users
The kernel using memory is why I suggested running slabtop, to see the kernel slab allocations. Clearly I Was barking up a wrong tree there... On Thu, 20 Jun 2019 at 14:41, Jeff Squyres (jsquyres) via users < users@lists.open-mpi.org> wrote: > On Jun 20, 2019, at 9:31 AM, Noam Bernstein via

Re: [OMPI users] growing memory use from MPI application

2019-06-20 Thread Noam Bernstein via users
> On Jun 20, 2019, at 9:40 AM, Jeff Squyres (jsquyres) > wrote: > > On Jun 20, 2019, at 9:31 AM, Noam Bernstein via users > wrote: >> >> One thing that I’m wondering if anyone familiar with the internals can >> explain is how you get a memory leak that isn’t freed when then program >>