Re: [O-MPI devel] [PATH] ompi_info doesn't show use_mem_hooks flag

2005-12-07 Thread Gleb Natapov
On Tue, Dec 06, 2005 at 11:07:44AM -0500, Brian Barrett wrote:
> On Dec 6, 2005, at 10:53 AM, Gleb Natapov wrote:
> 
> > On Tue, Dec 06, 2005 at 08:33:32AM -0700, Tim S. Woodall wrote:
> >>> Also memfree hooks decrease cache efficiency, the better solution  
> >>> would
> >>> be to catch brk() system calls and remove memory from cache only  
> >>> then,
> >>> but there is no way to do it for now.
> >>
> >> We are look at other options, including catching brk/munmap system  
> >> calls, and
> >> will be experimenting w/ these on the trunk.
> >>
> > This will be really interesting. How are you going to catch brk/munmap
> > without kernel help? Last time I checked preload tricks don't work if
> > syscall is done from inside libc itself.
> 
> All of the tricks we are looking at assume that nothing in libc calls  
> munmap.  

glibc does call mmap/munmap internally for big allocations as strace of
this program shows:

int main ()
{
void *p = malloc (1024*1024);
free (p);
}

>  We can successfully catch free() calls from inside libc  
> without any problems.  The LAM/MPI team and Myricom (with MPICH-gm)  
> have been doing this for many years without any problems.  On the  
> small percentage of MPI applications that require some linker tricks  
> (some of the commercial apps are this way), we won't be able to  
> intercept any free/munmap calls, so we're going to fall back to our  
> RDMA pipeline algorithm.
> 
Yes, but catching free is not good enough. This way we sometimes evict
cache entries that may safely remains in the cache. Ideally we should be 
able to catch events that return memory to OS (munmap/brk) and remove the 
memory from cache only then.

--
Gleb.


Re: [O-MPI devel] [PATH] ompi_info doesn't show use_mem_hooks flag

2005-12-07 Thread Brian Barrett


On Dec 7, 2005, at 9:44 AM, Gleb Natapov wrote:


On Tue, Dec 06, 2005 at 11:07:44AM -0500, Brian Barrett wrote:

On Dec 6, 2005, at 10:53 AM, Gleb Natapov wrote:


On Tue, Dec 06, 2005 at 08:33:32AM -0700, Tim S. Woodall wrote:

Also memfree hooks decrease cache efficiency, the better solution
would
be to catch brk() system calls and remove memory from cache only
then,
but there is no way to do it for now.


We are look at other options, including catching brk/munmap system
calls, and
will be experimenting w/ these on the trunk.

This will be really interesting. How are you going to catch brk/ 
munmap
without kernel help? Last time I checked preload tricks don't  
work if

syscall is done from inside libc itself.


All of the tricks we are looking at assume that nothing in libc calls
munmap.


glibc does call mmap/munmap internally for big allocations as  
strace of

this program shows:

int main ()
{
void *p = malloc (1024*1024);
free (p);
}


Ah, yes, I wasn't clear.  On Linux, we actually ship our own version  
of ptmalloc2 (the allocator used by glibc on Linux).  We use the  
standard linker search order tricks to have the linker choose our  
versions of malloc, calloc, realloc, valloc, and free, which are from  
ptmalloc2.  We've modified our version of ptmalloc2 such that any  
time it calls mmap or sbrk with a positive number, it then  
immediately allows the cache to know about the allocation.  Any time  
it's about to call munmap or sbrk with a negative number, it informs  
the cache code before giving the memory back to the OS.  We also  
catch mmap and munmap so that we can track when the user calls mmap /  
munmap.  Note that we play with ptmalloc2's code such that it calls  
our mmap (which either uses the syscall interface directly or calls  
__mmap depending on what the system supports), so we don't intercept  
that call to mmap twice or anything like that.


This works pretty well (like I said - it's worked fine for LAM and  
MPICH-gm for years), but has the problem of requiring the user to  
either use the wrapper compilers or add the -lmpi -lorte -lopal to  
the link line (ie, can't use shared library dependencies to load in  
libopal.so) or our ptmalloc2 / mmap / munmap isn't used.  We can  
detect that this happened pretty easily and then we fall back to the  
pipelined RDMA code that doesn't offer the same performance but also  
doesn't have a pinning problem.



 We can successfully catch free() calls from inside libc
without any problems.  The LAM/MPI team and Myricom (with MPICH-gm)
have been doing this for many years without any problems.  On the
small percentage of MPI applications that require some linker tricks
(some of the commercial apps are this way), we won't be able to
intercept any free/munmap calls, so we're going to fall back to our
RDMA pipeline algorithm.


Yes, but catching free is not good enough. This way we sometimes evict
cache entries that may safely remains in the cache. Ideally we  
should be
able to catch events that return memory to OS (munmap/brk) and  
remove the

memory from cache only then.


This is essentially what we do on Linux - we only tell the rcache  
code about allocations / deallocations when we are talking about  
getting memory from or giving memory back to the operating system.


On Mac OS X / Darwin, due to their two level namespaces, we can't  
replace malloc / free with a customized version of the Darwin  
allocator like we could with ptmalloc2.  There are some things you  
can do to simulate such behavior, but it requires linking in a flat  
namespace and doing some other things that nearly the Darwin  
engineers to pass out when I was talking to them about said tricks.   
So instead, we use the Darwin hooks for catching malloc / free /  
etc.  It's not optimal, but it's the best we can do in the  
situation.  And it doesn't force us to link all OMPI applications in  
a flat namespace, which is always nice.  Of course, we still  
intercept mmap / munmap in the tradition linker tricks style.  But  
again, there are very few function calls in libSystem.dylib that call  
mmap that we care about (malloc / free are already taken care of by  
the standard hooks), so this doesn't cause a problem.


Hopefully this made some sense.  If not, on to the next round of e- 
mails :).


Brian


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/




[O-MPI devel] Fwd: (j3.2005) Re: Derived types according to MPI2

2005-12-07 Thread Craig Rasmussen



Begin forwarded message:


From: Malcolm Cohen 
Date: November 21, 2005 11:23:59 AM MST
To: Aleksandar Donev 
Cc: J3 
Subject: (j3.2005) Re: Derived types according to MPI2

Aleksandar Donev said:
Yes, but the interesting thing is neither me nor Van were aware of  
what
the standard actually allows in terms of derived types and the  
storage
for the components, and presumably we know Fortran better. Can  
storage


One might have hoped so.


for the components be separated from the scalar derived type itself?


Hey, when *I* am the Fortran processor there's no contiguous storage,
or for that matter addressable storage!  Don't take too limited a view
of current "hard" ware.


how something is done as long as it is done well. But if you want to
pass an array of derived types to a parallel IO routine that is not
compiled by your super-smart Fortran compiler that chooses to scatter
the components across virtual-address space (yes, I mean virtual),  
then

you do NOT want that abstraction.


You cannot be serious.  You do realise that there is no requirement on
any array even on intrinsic data types to contain the "actual data".
Is that a problem in practice?  No of course not.

The Fortran standard doesn't mention virtual addressing, physical
addressing or any of these things.  Is that a problem?  No.

What the standard should do (and usually does) is to specify the  
behaviour
of the Fortran "virtual machine", i.e. the meaning of the program.   
How

that program gets mapped to hardware is way outside the scope of the
standard.

It is about choice. Leave preaching to the preachers. Programming  
is a
profession for a reason---programmers are experienced and educated  
and

understand the issues and don't need lectures on abstractions.


Apparently not.

Cheers,
--
...Malcolm Cohen, NAG Ltd., Oxford, U.K.
   (malc...@nag.co.uk)

__ 
__

This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
__ 
__