Re: [OMPI users] Memory manager
Terry - It tells us that I'm not as smart as I thought :). If you're willing to help track this down, I'd like to try some other things that will require a more involved patch (it'll take me a day or two to get the patch right). Let me know if you'd be wiling to look further (hopefully only another build or two) and I'll put the patch together. Brian On Wed, 21 May 2008, Terry Frankcombe wrote: Hi Brian I ran your experiment. Changing the MMAP threshold made no difference to the memory footprint (>8GB/process out of the box, an order of magnitude smaller with --with-memory-manager=none). What does that tell us? Ciao Terry On Tue, 2008-05-20 at 06:51 -0600, Brian Barrett wrote: Terry - Would you be willing to do an experiment with the memory allocator? There are two values we change to try to make IB run faster (at the cost of corner cases you're hitting). I'm not sure one is strictly necessary, and I'm concerned that it's the one causing problems. If you don't mind recompiling again, would you change line 64 in opal/mca/ memory/ptmalloc2/malloc.c from: #define DEFAULT_MMAP_THRESHOLD (2*1024*1024) to: #define DEFAULT_MMAP_THRESHOLD (128*1024) And then recompile with the memory manager, obviously. That will make the mmap / sbrk cross-over point the same as the default allocator in Linux. There's still one other tweak we do, but I'm almost 100% positive it's the threshold causing problems. Brian On May 19, 2008, at 8:17 PM, Terry Frankcombe wrote: To tell you all what noone wanted to tell me, yes, it does seem to be the memory manager. Compiling everything with --with-memory-manager=none returns the vmem use to the more reasonable ~100MB per process (down from >8GB). I take it this may affect my peak bandwidth over infiniband. What's the general feeling about how bad this is? On Tue, 2008-05-13 at 13:12 +1000, Terry Frankcombe wrote: Hi folks I'm trying to run an MPI app on an infiniband cluster with OpenMPI 1.2.6. When run on a single node, this app is grabbing large chunks of memory (total per process ~8.5GB, including strace showing a single 4GB grab) but not using it. The resident memory use is ~40MB per process. When this app is compiled in serial mode (with conditionals to remove the MPI calls) the memory use is more like what you'd expect, 40MB res and ~100MB vmem. Now I didn't write it so I'm not sure what extra stuff the MPI version does, and we haven't tracked down the large memory grabs. Could it be that this vmem is being grabbed by the OpenMPI memory manager rather than directly by the app? Ciao Terry ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Memory manager
Hi Brian I ran your experiment. Changing the MMAP threshold made no difference to the memory footprint (>8GB/process out of the box, an order of magnitude smaller with --with-memory-manager=none). What does that tell us? Ciao Terry On Tue, 2008-05-20 at 06:51 -0600, Brian Barrett wrote: > Terry - > > Would you be willing to do an experiment with the memory allocator? > There are two values we change to try to make IB run faster (at the > cost of corner cases you're hitting). I'm not sure one is strictly > necessary, and I'm concerned that it's the one causing problems. If > you don't mind recompiling again, would you change line 64 in opal/mca/ > memory/ptmalloc2/malloc.c from: > > #define DEFAULT_MMAP_THRESHOLD (2*1024*1024) > > to: > > #define DEFAULT_MMAP_THRESHOLD (128*1024) > > And then recompile with the memory manager, obviously. That will make > the mmap / sbrk cross-over point the same as the default allocator in > Linux. There's still one other tweak we do, but I'm almost 100% > positive it's the threshold causing problems. > > > Brian > > > On May 19, 2008, at 8:17 PM, Terry Frankcombe wrote: > > > To tell you all what noone wanted to tell me, yes, it does seem to be > > the memory manager. Compiling everything with > > --with-memory-manager=none returns the vmem use to the more reasonable > > ~100MB per process (down from >8GB). > > > > I take it this may affect my peak bandwidth over infiniband. What's > > the > > general feeling about how bad this is? > > > > > > On Tue, 2008-05-13 at 13:12 +1000, Terry Frankcombe wrote: > >> Hi folks > >> > >> I'm trying to run an MPI app on an infiniband cluster with OpenMPI > >> 1.2.6. > >> > >> When run on a single node, this app is grabbing large chunks of > >> memory > >> (total per process ~8.5GB, including strace showing a single 4GB > >> grab) > >> but not using it. The resident memory use is ~40MB per process. > >> When > >> this app is compiled in serial mode (with conditionals to remove > >> the MPI > >> calls) the memory use is more like what you'd expect, 40MB res and > >> ~100MB vmem. > >> > >> Now I didn't write it so I'm not sure what extra stuff the MPI > >> version > >> does, and we haven't tracked down the large memory grabs. > >> > >> Could it be that this vmem is being grabbed by the OpenMPI memory > >> manager rather than directly by the app? > >> > >> Ciao > >> Terry > >> > >> > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > >
Re: [OMPI users] Memory manager
Terry - Would you be willing to do an experiment with the memory allocator? There are two values we change to try to make IB run faster (at the cost of corner cases you're hitting). I'm not sure one is strictly necessary, and I'm concerned that it's the one causing problems. If you don't mind recompiling again, would you change line 64 in opal/mca/ memory/ptmalloc2/malloc.c from: #define DEFAULT_MMAP_THRESHOLD (2*1024*1024) to: #define DEFAULT_MMAP_THRESHOLD (128*1024) And then recompile with the memory manager, obviously. That will make the mmap / sbrk cross-over point the same as the default allocator in Linux. There's still one other tweak we do, but I'm almost 100% positive it's the threshold causing problems. Brian On May 19, 2008, at 8:17 PM, Terry Frankcombe wrote: To tell you all what noone wanted to tell me, yes, it does seem to be the memory manager. Compiling everything with --with-memory-manager=none returns the vmem use to the more reasonable ~100MB per process (down from >8GB). I take it this may affect my peak bandwidth over infiniband. What's the general feeling about how bad this is? On Tue, 2008-05-13 at 13:12 +1000, Terry Frankcombe wrote: Hi folks I'm trying to run an MPI app on an infiniband cluster with OpenMPI 1.2.6. When run on a single node, this app is grabbing large chunks of memory (total per process ~8.5GB, including strace showing a single 4GB grab) but not using it. The resident memory use is ~40MB per process. When this app is compiled in serial mode (with conditionals to remove the MPI calls) the memory use is more like what you'd expect, 40MB res and ~100MB vmem. Now I didn't write it so I'm not sure what extra stuff the MPI version does, and we haven't tracked down the large memory grabs. Could it be that this vmem is being grabbed by the OpenMPI memory manager rather than directly by the app? Ciao Terry ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Brian Barrett Open MPI developer http://www.open-mpi.org/
Re: [OMPI users] Memory manager
On Tue, May 20, 2008 at 12:17:02PM +1000, Terry Frankcombe wrote: > To tell you all what noone wanted to tell me, yes, it does seem to be > the memory manager. Compiling everything with > --with-memory-manager=none returns the vmem use to the more reasonable > ~100MB per process (down from >8GB). > > I take it this may affect my peak bandwidth over infiniband. What's the > general feeling about how bad this is? You will not be able to use "-mca mpi_leave_pinned 1" parameter and your micro benchmark performance will be bad. Real application will see the difference only if it reuses communication buffers frequently. > > > On Tue, 2008-05-13 at 13:12 +1000, Terry Frankcombe wrote: > > Hi folks > > > > I'm trying to run an MPI app on an infiniband cluster with OpenMPI > > 1.2.6. > > > > When run on a single node, this app is grabbing large chunks of memory > > (total per process ~8.5GB, including strace showing a single 4GB grab) > > but not using it. The resident memory use is ~40MB per process. When > > this app is compiled in serial mode (with conditionals to remove the MPI > > calls) the memory use is more like what you'd expect, 40MB res and > > ~100MB vmem. > > > > Now I didn't write it so I'm not sure what extra stuff the MPI version > > does, and we haven't tracked down the large memory grabs. > > > > Could it be that this vmem is being grabbed by the OpenMPI memory > > manager rather than directly by the app? > > > > Ciao > > Terry > > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Gleb.
Re: [OMPI users] Memory manager
To tell you all what noone wanted to tell me, yes, it does seem to be the memory manager. Compiling everything with --with-memory-manager=none returns the vmem use to the more reasonable ~100MB per process (down from >8GB). I take it this may affect my peak bandwidth over infiniband. What's the general feeling about how bad this is? On Tue, 2008-05-13 at 13:12 +1000, Terry Frankcombe wrote: > Hi folks > > I'm trying to run an MPI app on an infiniband cluster with OpenMPI > 1.2.6. > > When run on a single node, this app is grabbing large chunks of memory > (total per process ~8.5GB, including strace showing a single 4GB grab) > but not using it. The resident memory use is ~40MB per process. When > this app is compiled in serial mode (with conditionals to remove the MPI > calls) the memory use is more like what you'd expect, 40MB res and > ~100MB vmem. > > Now I didn't write it so I'm not sure what extra stuff the MPI version > does, and we haven't tracked down the large memory grabs. > > Could it be that this vmem is being grabbed by the OpenMPI memory > manager rather than directly by the app? > > Ciao > Terry > >
[OMPI users] Memory manager
Hi folks I'm trying to run an MPI app on an infiniband cluster with OpenMPI 1.2.6. When run on a single node, this app is grabbing large chunks of memory (total per process ~8.5GB, including strace showing a single 4GB grab) but not using it. The resident memory use is ~40MB per process. When this app is compiled in serial mode (with conditionals to remove the MPI calls) the memory use is more like what you'd expect, 40MB res and ~100MB vmem. Now I didn't write it so I'm not sure what extra stuff the MPI version does, and we haven't tracked down the large memory grabs. Could it be that this vmem is being grabbed by the OpenMPI memory manager rather than directly by the app? Ciao Terry -- Dr. Terry Frankcombe Research School of Chemistry, Australian National University Ph: (+61) 0417 163 509Skype: terry.frankcombe
Re: [OMPI users] Memory manager
On Nov 20, 2007, at 6:52 AM, Terry Frankcombe wrote: I posted this to the devel list the other day, but it raised no responses. Maybe people will have more to say here. Sorry Terry; many of us were at the SC conference last week, and this week is short because of the US holiday. Some of the inbox got dropped/delayed as a result... (case in point: this mail sat unfinished on my laptop until I returned from the holiday today -- sorry!) Questions: How much does using the MPI wrappers influence the memory management at runtime? I'm not sure what you mean here, but it's not really the MPI wrappers that are at issue. Rather, it's whether support for the memory manager was compiled into the Open MPI libraries or not. For example (and I just double checked this to be sure) -- I compiled OMPI with and without the memory manager on RHEL4U4 and the output from "mpicc -- showme" is exactly the same. What has changed in this regard from 1.2.3 to 1.2.4? Nothing, AFAIK...? I don't see anything in NEWS w.r.t. the memory manager stuff for v1.2.4. The reason I ask is that I have an f90 code that does very strange things. The structure of the code is not all that straightforward, with a "tree" of modules usually allocating their own storage (all with save applied globally within the module). Compiling with OpenMPI 1.2.4 coupled to a gcc 4.3.0 prerelease and running as a single process (with no explicit mpirun), the elements of one particular array seem to revert to previous values between where they are set and a later part of the code. (I'll refer to this as The Bug, and having the matrix elements stay as set as "expected behaviour".) Yoinks. :-( The most obvious explanation would be a coding error. However, compiling and running this with OpenMPI 1.2.3 gives me the expected behaviour! As does compiling and running with a different MPI implementation and compiler set. Replacing the prerelease gcc 4.3.0 with the released 4.2.2 makes no change. The Bug is unstable. Removing calls to various routines in used modules (that otherwise do not effect the results) returns to expected behaviour at runtime. Removing a call to MPI_Recv that is never called returns to expected behaviour. Because of this I can't reduce the problem to a small testcase, and so have not included any code at this stage. Ugh. Heisenbugs are the worst. Have you tried with a memory checking debugger, such as valgrind, or a parallel debugger? Is there a chance that there's a simple errant posted receive (perhaps in a race condition) that is unexpectedly receiving into the Bug's memory location when you don't expect it? If I run the code with mpirun -np 1 the problem goes away. So one could presumably simply say "always run it with mpirun." But if this is required, why does OpenMPI not detect it? I'm not sure what you're asking -- Open MPI does not *require* you to run with mpirun. Indeed, the memory management stuff that is in Open MPI doesn't require the use of mpirun (or not). If you run without mpirun, you'll get an MPI_COMM_WORLD size of 1 (known as a "singleton" MPI job). And why the difference between 1.2.3 and 1.2.4? There are lots of differences between 1.2.3 and 1.2.4 -- see: https://svn.open-mpi.org/trac/ompi/browser/branches/v1.2/NEWS As for what exactly would cause it to exhibit the Bug behavior in 1.2.4 and not in 1.2.3 -- I don't know. As I said above, Heisenbugs are the worst -- changing one thing makes it [seem to] go away, etc. It could be that the Bug still exists but simply is not being obvious when you use 1.2.3. Buffer overflows can be like that, for example -- if you overflow into an area of memory that doesn't matter, then you'll never notice the bug. But if you move some data around, now perhaps that same buffer overflow will overwrite some critical memory and you *will* notice the Bug. -- Jeff Squyres Cisco Systems
[OMPI users] Memory manager
Hi folks I posted this to the devel list the other day, but it raised no responses. Maybe people will have more to say here. Questions: How much does using the MPI wrappers influence the memory management at runtime? What has changed in this regard from 1.2.3 to 1.2.4? The reason I ask is that I have an f90 code that does very strange things. The structure of the code is not all that straightforward, with a "tree" of modules usually allocating their own storage (all with save applied globally within the module). Compiling with OpenMPI 1.2.4 coupled to a gcc 4.3.0 prerelease and running as a single process (with no explicit mpirun), the elements of one particular array seem to revert to previous values between where they are set and a later part of the code. (I'll refer to this as The Bug, and having the matrix elements stay as set as "expected behaviour".) The most obvious explanation would be a coding error. However, compiling and running this with OpenMPI 1.2.3 gives me the expected behaviour! As does compiling and running with a different MPI implementation and compiler set. Replacing the prerelease gcc 4.3.0 with the released 4.2.2 makes no change. The Bug is unstable. Removing calls to various routines in used modules (that otherwise do not effect the results) returns to expected behaviour at runtime. Removing a call to MPI_Recv that is never called returns to expected behaviour. Because of this I can't reduce the problem to a small testcase, and so have not included any code at this stage. If I run the code with mpirun -np 1 the problem goes away. So one could presumably simply say "always run it with mpirun." But if this is required, why does OpenMPI not detect it? And why the difference between 1.2.3 and 1.2.4? Does anyone care to comment? Ciao Terry -- Dr Terry Frankcombe Physical Chemistry, Department of Chemistry Göteborgs Universitet SE-412 96 Göteborg Sweden Ph: +46 76 224 0887 Skype: terry.frankcombe