Re: [OMPI users] Memory manager

2008-05-21 Thread Brian W. Barrett

Terry -

It tells us that I'm not as smart as I thought :).  If you're willing to 
help track this down, I'd like to try some other things that will require 
a more involved patch (it'll take me a day or two to get the patch right). 
Let me know if you'd be wiling to look further (hopefully only another 
build or two) and I'll put the patch together.


Brian


On Wed, 21 May 2008, Terry Frankcombe wrote:


Hi Brian

I ran your experiment.  Changing the MMAP threshold made no difference
to the memory footprint (>8GB/process out of the box, an order of
magnitude smaller with --with-memory-manager=none).

What does that tell us?

Ciao
Terry



On Tue, 2008-05-20 at 06:51 -0600, Brian Barrett wrote:

Terry -

Would you be willing to do an experiment with the memory allocator?
There are two values we change to try to make IB run faster (at the
cost of corner cases you're hitting).  I'm not sure one is strictly
necessary, and I'm concerned that it's the one causing problems.  If
you don't mind recompiling again, would you change line 64 in opal/mca/
memory/ptmalloc2/malloc.c from:

#define DEFAULT_MMAP_THRESHOLD (2*1024*1024)

to:

#define DEFAULT_MMAP_THRESHOLD (128*1024)

And then recompile with the memory manager, obviously.  That will make
the mmap / sbrk cross-over point the same as the default allocator in
Linux.  There's still one other tweak we do, but I'm almost 100%
positive it's the threshold causing problems.


Brian


On May 19, 2008, at 8:17 PM, Terry Frankcombe wrote:


To tell you all what noone wanted to tell me, yes, it does seem to be
the memory manager.  Compiling everything with
--with-memory-manager=none returns the vmem use to the more reasonable
~100MB per process (down from >8GB).

I take it this may affect my peak bandwidth over infiniband.  What's
the
general feeling about how bad this is?


On Tue, 2008-05-13 at 13:12 +1000, Terry Frankcombe wrote:

Hi folks

I'm trying to run an MPI app on an infiniband cluster with OpenMPI
1.2.6.

When run on a single node, this app is grabbing large chunks of
memory
(total per process ~8.5GB, including strace showing a single 4GB
grab)
but not using it.  The resident memory use is ~40MB per process.
When
this app is compiled in serial mode (with conditionals to remove
the MPI
calls) the memory use is more like what you'd expect, 40MB res and
~100MB vmem.

Now I didn't write it so I'm not sure what extra stuff the MPI
version
does, and we haven't tracked down the large memory grabs.

Could it be that this vmem is being grabbed by the OpenMPI memory
manager rather than directly by the app?

Ciao
Terry




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Memory manager

2008-05-21 Thread Terry Frankcombe
Hi Brian

I ran your experiment.  Changing the MMAP threshold made no difference
to the memory footprint (>8GB/process out of the box, an order of
magnitude smaller with --with-memory-manager=none).

What does that tell us?

Ciao
Terry



On Tue, 2008-05-20 at 06:51 -0600, Brian Barrett wrote:
> Terry -
> 
> Would you be willing to do an experiment with the memory allocator?   
> There are two values we change to try to make IB run faster (at the  
> cost of corner cases you're hitting).  I'm not sure one is strictly  
> necessary, and I'm concerned that it's the one causing problems.  If  
> you don't mind recompiling again, would you change line 64 in opal/mca/ 
> memory/ptmalloc2/malloc.c from:
> 
> #define DEFAULT_MMAP_THRESHOLD (2*1024*1024)
> 
> to:
> 
> #define DEFAULT_MMAP_THRESHOLD (128*1024)
> 
> And then recompile with the memory manager, obviously.  That will make  
> the mmap / sbrk cross-over point the same as the default allocator in  
> Linux.  There's still one other tweak we do, but I'm almost 100%  
> positive it's the threshold causing problems.
> 
> 
> Brian
> 
> 
> On May 19, 2008, at 8:17 PM, Terry Frankcombe wrote:
> 
> > To tell you all what noone wanted to tell me, yes, it does seem to be
> > the memory manager.  Compiling everything with
> > --with-memory-manager=none returns the vmem use to the more reasonable
> > ~100MB per process (down from >8GB).
> >
> > I take it this may affect my peak bandwidth over infiniband.  What's  
> > the
> > general feeling about how bad this is?
> >
> >
> > On Tue, 2008-05-13 at 13:12 +1000, Terry Frankcombe wrote:
> >> Hi folks
> >>
> >> I'm trying to run an MPI app on an infiniband cluster with OpenMPI
> >> 1.2.6.
> >>
> >> When run on a single node, this app is grabbing large chunks of  
> >> memory
> >> (total per process ~8.5GB, including strace showing a single 4GB  
> >> grab)
> >> but not using it.  The resident memory use is ~40MB per process.   
> >> When
> >> this app is compiled in serial mode (with conditionals to remove  
> >> the MPI
> >> calls) the memory use is more like what you'd expect, 40MB res and
> >> ~100MB vmem.
> >>
> >> Now I didn't write it so I'm not sure what extra stuff the MPI  
> >> version
> >> does, and we haven't tracked down the large memory grabs.
> >>
> >> Could it be that this vmem is being grabbed by the OpenMPI memory
> >> manager rather than directly by the app?
> >>
> >> Ciao
> >> Terry
> >>
> >>
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> 



Re: [OMPI users] Memory manager

2008-05-20 Thread Brian Barrett

Terry -

Would you be willing to do an experiment with the memory allocator?   
There are two values we change to try to make IB run faster (at the  
cost of corner cases you're hitting).  I'm not sure one is strictly  
necessary, and I'm concerned that it's the one causing problems.  If  
you don't mind recompiling again, would you change line 64 in opal/mca/ 
memory/ptmalloc2/malloc.c from:


#define DEFAULT_MMAP_THRESHOLD (2*1024*1024)

to:

#define DEFAULT_MMAP_THRESHOLD (128*1024)

And then recompile with the memory manager, obviously.  That will make  
the mmap / sbrk cross-over point the same as the default allocator in  
Linux.  There's still one other tweak we do, but I'm almost 100%  
positive it's the threshold causing problems.



Brian


On May 19, 2008, at 8:17 PM, Terry Frankcombe wrote:


To tell you all what noone wanted to tell me, yes, it does seem to be
the memory manager.  Compiling everything with
--with-memory-manager=none returns the vmem use to the more reasonable
~100MB per process (down from >8GB).

I take it this may affect my peak bandwidth over infiniband.  What's  
the

general feeling about how bad this is?


On Tue, 2008-05-13 at 13:12 +1000, Terry Frankcombe wrote:

Hi folks

I'm trying to run an MPI app on an infiniband cluster with OpenMPI
1.2.6.

When run on a single node, this app is grabbing large chunks of  
memory
(total per process ~8.5GB, including strace showing a single 4GB  
grab)
but not using it.  The resident memory use is ~40MB per process.   
When
this app is compiled in serial mode (with conditionals to remove  
the MPI

calls) the memory use is more like what you'd expect, 40MB res and
~100MB vmem.

Now I didn't write it so I'm not sure what extra stuff the MPI  
version

does, and we haven't tracked down the large memory grabs.

Could it be that this vmem is being grabbed by the OpenMPI memory
manager rather than directly by the app?

Ciao
Terry




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/




Re: [OMPI users] Memory manager

2008-05-20 Thread Gleb Natapov
On Tue, May 20, 2008 at 12:17:02PM +1000, Terry Frankcombe wrote:
> To tell you all what noone wanted to tell me, yes, it does seem to be
> the memory manager.  Compiling everything with
> --with-memory-manager=none returns the vmem use to the more reasonable
> ~100MB per process (down from >8GB).
> 
> I take it this may affect my peak bandwidth over infiniband.  What's the
> general feeling about how bad this is?
You will not be able to use "-mca mpi_leave_pinned 1" parameter and your
micro benchmark performance will be bad. Real application will see the
difference only if it reuses communication buffers frequently.

> 
> 
> On Tue, 2008-05-13 at 13:12 +1000, Terry Frankcombe wrote:
> > Hi folks
> > 
> > I'm trying to run an MPI app on an infiniband cluster with OpenMPI
> > 1.2.6.
> > 
> > When run on a single node, this app is grabbing large chunks of memory
> > (total per process ~8.5GB, including strace showing a single 4GB grab)
> > but not using it.  The resident memory use is ~40MB per process.  When
> > this app is compiled in serial mode (with conditionals to remove the MPI
> > calls) the memory use is more like what you'd expect, 40MB res and
> > ~100MB vmem.
> > 
> > Now I didn't write it so I'm not sure what extra stuff the MPI version
> > does, and we haven't tracked down the large memory grabs.
> > 
> > Could it be that this vmem is being grabbed by the OpenMPI memory
> > manager rather than directly by the app?
> > 
> > Ciao
> > Terry
> > 
> > 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Gleb.


Re: [OMPI users] Memory manager

2008-05-19 Thread Terry Frankcombe
To tell you all what noone wanted to tell me, yes, it does seem to be
the memory manager.  Compiling everything with
--with-memory-manager=none returns the vmem use to the more reasonable
~100MB per process (down from >8GB).

I take it this may affect my peak bandwidth over infiniband.  What's the
general feeling about how bad this is?


On Tue, 2008-05-13 at 13:12 +1000, Terry Frankcombe wrote:
> Hi folks
> 
> I'm trying to run an MPI app on an infiniband cluster with OpenMPI
> 1.2.6.
> 
> When run on a single node, this app is grabbing large chunks of memory
> (total per process ~8.5GB, including strace showing a single 4GB grab)
> but not using it.  The resident memory use is ~40MB per process.  When
> this app is compiled in serial mode (with conditionals to remove the MPI
> calls) the memory use is more like what you'd expect, 40MB res and
> ~100MB vmem.
> 
> Now I didn't write it so I'm not sure what extra stuff the MPI version
> does, and we haven't tracked down the large memory grabs.
> 
> Could it be that this vmem is being grabbed by the OpenMPI memory
> manager rather than directly by the app?
> 
> Ciao
> Terry
> 
> 



[OMPI users] Memory manager

2008-05-12 Thread Terry Frankcombe
Hi folks

I'm trying to run an MPI app on an infiniband cluster with OpenMPI
1.2.6.

When run on a single node, this app is grabbing large chunks of memory
(total per process ~8.5GB, including strace showing a single 4GB grab)
but not using it.  The resident memory use is ~40MB per process.  When
this app is compiled in serial mode (with conditionals to remove the MPI
calls) the memory use is more like what you'd expect, 40MB res and
~100MB vmem.

Now I didn't write it so I'm not sure what extra stuff the MPI version
does, and we haven't tracked down the large memory grabs.

Could it be that this vmem is being grabbed by the OpenMPI memory
manager rather than directly by the app?

Ciao
Terry


-- 
Dr. Terry Frankcombe
Research School of Chemistry, Australian National University
Ph: (+61) 0417 163 509Skype: terry.frankcombe



Re: [OMPI users] Memory manager

2007-11-26 Thread Jeff Squyres

On Nov 20, 2007, at 6:52 AM, Terry Frankcombe wrote:


I posted this to the devel list the other day, but it raised no
responses.  Maybe people will have more to say here.


Sorry Terry; many of us were at the SC conference last week, and this  
week is short because of the US holiday.  Some of the inbox got  
dropped/delayed as a result...


(case in point: this mail sat unfinished on my laptop until I returned  
from the holiday today -- sorry!)



Questions:  How much does using the MPI wrappers influence the memory
management at runtime?


I'm not sure what you mean here, but it's not really the MPI wrappers  
that are at issue.  Rather, it's whether support for the memory  
manager was compiled into the Open MPI libraries or not.  For example  
(and I just double checked this to be sure) -- I compiled OMPI with  
and without the memory manager on RHEL4U4 and the output from "mpicc -- 
showme" is exactly the same.



What has changed in this regard from 1.2.3 to 1.2.4?


Nothing, AFAIK...?  I don't see anything in NEWS w.r.t. the memory  
manager stuff for v1.2.4.



The reason I ask is that I have an f90 code that does very strange
things.  The structure of the code is not all that straightforward,  
with
a "tree" of modules usually allocating their own storage (all with  
save

applied globally within the module).  Compiling with OpenMPI 1.2.4
coupled to a gcc 4.3.0 prerelease and running as a single process  
(with
no explicit mpirun), the elements of one particular array seem to  
revert

to previous values between where they are set and a later part of the
code.  (I'll refer to this as The Bug, and having the matrix elements
stay as set as "expected behaviour".)


Yoinks.  :-(


The most obvious explanation would be a coding error.  However,
compiling and running this with OpenMPI 1.2.3 gives me the expected
behaviour!  As does compiling and running with a different MPI
implementation and compiler set.  Replacing the prerelease gcc 4.3.0
with the released 4.2.2 makes no change.

The Bug is unstable.  Removing calls to various routines in used  
modules
(that otherwise do not effect the results) returns to expected  
behaviour
at runtime.  Removing a call to MPI_Recv that is never called  
returns to

expected behaviour.

Because of this I can't reduce the problem to a small testcase, and so
have not included any code at this stage.


Ugh.  Heisenbugs are the worst.

Have you tried with a memory checking debugger, such as valgrind, or a  
parallel debugger?  Is there a chance that there's a simple errant  
posted receive (perhaps in a race condition) that is unexpectedly  
receiving into the Bug's memory location when you don't expect it?


If I run the code with mpirun -np 1 the problem goes away.  So one  
could

presumably simply say "always run it with mpirun."  But if this is
required, why does OpenMPI not detect it?


I'm not sure what you're asking -- Open MPI does not *require* you to  
run with mpirun.  Indeed, the memory management stuff that is in Open  
MPI doesn't require the use of mpirun (or not).  If you run without  
mpirun, you'll get an MPI_COMM_WORLD size of 1 (known as a "singleton"  
MPI job).



And why the difference
between 1.2.3 and 1.2.4?


There are lots of differences between 1.2.3 and 1.2.4 -- see:

https://svn.open-mpi.org/trac/ompi/browser/branches/v1.2/NEWS

As for what exactly would cause it to exhibit the Bug behavior in  
1.2.4 and not in 1.2.3 -- I don't know.  As I said above, Heisenbugs  
are the worst -- changing one thing makes it [seem to] go away, etc.   
It could be that the Bug still exists but simply is not being obvious  
when you use 1.2.3.  Buffer overflows can be like that, for example --  
if you overflow into an area of memory that doesn't matter, then  
you'll never notice the bug.  But if you move some data around, now  
perhaps that same buffer overflow will overwrite some critical memory  
and you *will* notice the Bug.


--
Jeff Squyres
Cisco Systems



[OMPI users] Memory manager

2007-11-20 Thread Terry Frankcombe
Hi folks

I posted this to the devel list the other day, but it raised no
responses.  Maybe people will have more to say here.


Questions:  How much does using the MPI wrappers influence the memory
management at runtime?  What has changed in this regard from 1.2.3 to
1.2.4?


The reason I ask is that I have an f90 code that does very strange
things.  The structure of the code is not all that straightforward, with
a "tree" of modules usually allocating their own storage (all with save
applied globally within the module).  Compiling with OpenMPI 1.2.4
coupled to a gcc 4.3.0 prerelease and running as a single process (with
no explicit mpirun), the elements of one particular array seem to revert
to previous values between where they are set and a later part of the
code.  (I'll refer to this as The Bug, and having the matrix elements
stay as set as "expected behaviour".)

The most obvious explanation would be a coding error.  However,
compiling and running this with OpenMPI 1.2.3 gives me the expected
behaviour!  As does compiling and running with a different MPI
implementation and compiler set.  Replacing the prerelease gcc 4.3.0
with the released 4.2.2 makes no change.

The Bug is unstable.  Removing calls to various routines in used modules
(that otherwise do not effect the results) returns to expected behaviour
at runtime.  Removing a call to MPI_Recv that is never called returns to
expected behaviour.

Because of this I can't reduce the problem to a small testcase, and so
have not included any code at this stage.

If I run the code with mpirun -np 1 the problem goes away.  So one could
presumably simply say "always run it with mpirun."  But if this is
required, why does OpenMPI not detect it?  And why the difference
between 1.2.3 and 1.2.4?

Does anyone care to comment?

Ciao
Terry


-- 
Dr Terry Frankcombe
Physical Chemistry, Department of Chemistry
Göteborgs Universitet
SE-412 96 Göteborg Sweden
Ph: +46 76 224 0887   Skype: terry.frankcombe