This case is actually quite small - 10 physical machines with 18 physical
cores each, 1 rank per machine.  These are AWS R4 instances (Intel Xeon E5
Broadwell processors).  OpenMPI version 2.1.0, using TCP (10 Gbps).

I calculate the memory needs of my application upfront (in this case ~225
GB per machine), allocate one buffer upfront, and reuse this buffer for
valid and scratch throughout processing.  This is running on RHEL 7 - I'm
measuring memory usage via top where I see it go up to 248 GB in an
MPI-intensive portion of processing.

I thought I was being quite careful with my memory allocations and there
weren't any other stray allocations going on, but of course it's possible
there's a large temp buffer somewhere that I've missed... based on what
you're saying, this is way more memory than should be attributed to OpenMPI
- is there a way I can query OpenMPI to confirm that?  If the OS is unable
to keep up with the network traffic, is it possible there's some low-level
system buffer that gets allocated to gradually work off the TCP traffic?


On Thu, Dec 20, 2018 at 8:32 AM Nathan Hjelm via users <> wrote:

> How many nodes are you using? How many processes per node? What kind of
> processor? Open MPI version? 25 GB is several orders of magnitude more
> memory than should be used except at extreme scale (1M+ processes). Also,
> how are you calculating memory usage?
> -Nathan
> > On Dec 20, 2018, at 4:49 AM, Adam Sylvester <> wrote:
> >
> > Is there a way at runtime to query OpenMPI to ask it how much memory
> it's using for internal buffers?  Is there a way at runtime to set a max
> amount of memory OpenMPI will use for these buffers?  I have an application
> where for certain inputs OpenMPI appears to be allocating ~25 GB and I'm
> not accounting for this in my memory calculations (and thus bricking the
> machine).
> >
> > Thanks.
> > -Adam
> > _______________________________________________
> > users mailing list
> >
> >
> _______________________________________________
> users mailing list
users mailing list

Reply via email to