This case is actually quite small - 10 physical machines with 18 physical cores each, 1 rank per machine. These are AWS R4 instances (Intel Xeon E5 Broadwell processors). OpenMPI version 2.1.0, using TCP (10 Gbps).
I calculate the memory needs of my application upfront (in this case ~225 GB per machine), allocate one buffer upfront, and reuse this buffer for valid and scratch throughout processing. This is running on RHEL 7 - I'm measuring memory usage via top where I see it go up to 248 GB in an MPI-intensive portion of processing. I thought I was being quite careful with my memory allocations and there weren't any other stray allocations going on, but of course it's possible there's a large temp buffer somewhere that I've missed... based on what you're saying, this is way more memory than should be attributed to OpenMPI - is there a way I can query OpenMPI to confirm that? If the OS is unable to keep up with the network traffic, is it possible there's some low-level system buffer that gets allocated to gradually work off the TCP traffic? Thanks. On Thu, Dec 20, 2018 at 8:32 AM Nathan Hjelm via users < firstname.lastname@example.org> wrote: > How many nodes are you using? How many processes per node? What kind of > processor? Open MPI version? 25 GB is several orders of magnitude more > memory than should be used except at extreme scale (1M+ processes). Also, > how are you calculating memory usage? > > -Nathan > > > On Dec 20, 2018, at 4:49 AM, Adam Sylvester <op8...@gmail.com> wrote: > > > > Is there a way at runtime to query OpenMPI to ask it how much memory > it's using for internal buffers? Is there a way at runtime to set a max > amount of memory OpenMPI will use for these buffers? I have an application > where for certain inputs OpenMPI appears to be allocating ~25 GB and I'm > not accounting for this in my memory calculations (and thus bricking the > machine). > > > > Thanks. > > -Adam > > _______________________________________________ > > users mailing list > > email@example.com > > https://lists.open-mpi.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > firstname.lastname@example.org > https://lists.open-mpi.org/mailman/listinfo/users >
_______________________________________________ users mailing list email@example.com https://lists.open-mpi.org/mailman/listinfo/users