Re: Memory manager

2002-06-27 Thread Emery Berger
Andi Gutmans wrote:
I read the reaps article. I didn't quite understand how they actually 
code their heaps and therefore it's hard to understand how fast it 
really is.
My approach is actually similar to theirs (pools with free) and they 
even mention this kind of approach. I created pools which are internally 
managed similar to Doug Lea style.
Do these guys make their code available someplace?
We (I) will. The camera-ready copy has to go to OOPSLA in about two 
weeks, and I'll make the code available. I've also revised the text, and 
I think that the explanation of how reaps work is better.

You might want to look at our Composing High-Performance Memory 
Allocators paper, in PLDI 2001 and available from my web page, for more 
details on how the Heap Layers infrastructure works (I built reaps using 
this infrastructure).

Regards,
-- Emery
--
Emery Berger
Assistant Professor (starting Fall 2002)
Dept. of Computer Science
University of Massachusetts, Amherst
www.cs.utexas.edu/users/emery



Re: Memory manager

2002-06-26 Thread Andi Gutmans
At 11:34 AM 6/25/2002 -0700, Greg Stein wrote:
On Tue, Jun 25, 2002 at 09:07:22PM +0300, Andi Gutmans wrote:
 I won't cc: the apr dev group because it'll just clutter their list :)
Heh. Well, I trimmed out some, and am taking one point to the list.
 At 03:09 AM 6/25/2002 -0700, Greg Stein wrote:
...
 Sander mentioned something about reaps. I remember that coming up a 
while
 back, but am not super clear on it. IIRC, it was a synthesis of pools and
 being able to individually free items. Probably something right along the
 lines of what you're looking for.

 If you guys feel this works better for you then great!

My point is that (IIRC) reaps are essentially what you are building. There
is some actual academic research on their use. You might find that useful
for your own work.
Further, that we would want to look at your work and compare that with the
reap information, and bring that into APR.
The post about reaps is here:
http://www.apachelabs.org/apr-mbox/200203.mbox/[EMAIL PROTECTED]
I read the reaps article. I didn't quite understand how they actually code 
their heaps and therefore it's hard to understand how fast it really is.
My approach is actually similar to theirs (pools with free) and they even 
mention this kind of approach. I created pools which are internally managed 
similar to Doug Lea style.
Do these guys make their code available someplace?

Andi


Re: Memory manager

2002-06-25 Thread Greg Stein
On Tue, Jun 25, 2002 at 07:20:34AM +0300, Andi Gutmans wrote:
 At 03:58 PM 6/24/2002 -0700, Greg Stein wrote:
...
 Um. We use pools in Subversion and free the memory all the time. The key is
 the use of subpools. I added some notes about our experiences at the end of
 this document:
 
  http://cvs.apache.org/viewcvs/apr-serf/docs/roadmap.txt?rev=1.3
 
 Note that pools can also be configured to not have a per-thread lock.
 
 Ouch, you really worked hard there.

Hard? Not really. The pattern is not difficult to implement, and only
certain types of loops require strict subpool usage (loops which have an
input which is pretty well unbounded (based on some user input or file or
whatever)).

On the other side of the coin, however, is that none of our code ever is
concerned about free'ing stuff. We don't have to litter efree() throughout
our code, yet we also know that somebody will get rid of everything that we
happen to allocate [when it is appropriate].

Basically, there is a huge burden lifted by not needing to track every
allocation in the code itself.

When we *do* free (by destroying a pool), we're also getting rid of a bunch
of other, associated stuff. We never need to zero in on a particular item
and say, get rid of *that*. All allocations come in associated groups, so
we take advantage of that and place them all into a (sub)pool.

 That is exactly what we can't do in 
 PHP. Our code base is so big that the easiest solution for us has always 
 been to just give our users the memory allocation API they are used to (in 
 our case emalloc(), efree(), erealloc() and so on) and just make sure that 
 all of this memory gets freed at the end of each request (we also have some 
 leak detection code but that is coded on top of the actual memory manager).

Understood. Of course, the problem is that if somebody gets into the habit
of, well, it will just be tossed at the end of the request and *stops*
using the efree() function, then you could end up with a *huge* working set.
We saw plenty of that in Subversion :-)

Tossing (groups of) memory during unbounded iteration is always necessary,
whether using pools or an alloc/free strategy. Failing that, each item that
might ever be allocated within the loop must be individually tracked by the
code which does the alloc, and then ensured that it gets freed.

 Also as PHP is a scripting language it can run for quite a bit and do lots 
 of allocation's and free's. You can't really do any planning like you guys 
 did in Subversion on exactly when stuff can be freed and when not. Grouping 
 memory allocations is virtually impossible. Anyway, it does seem that you 
 guys had to work a bit too hard.

I don't think so. While it takes some discipline, I'm not sure that I equate
that with a lot of difficulty. And your comment about when stuff can be
freed and when not simply tells me that your code is a bit too, um,
unstructured :-)

Subversion has very nice lines about when stuff is valid, and when it goes
away. Every object has a defined lifetime, and that is defined by the pool
it was placed into. We don't have destructors -- the object's death is
determined by the pool that the caller placed the object into.

Even an interpreter like PHP can be structured to have a well-defined
hierarchy of lifetimes. At the top is PHP itself. Then you have children for
each interpreter engine, maybe each thread, each time you run the compiler,
each script, etc. I'm sure there is hierarchy within that, but I'm not
familiar enough with the internals.

The mess only arrives once you start running the code :-)  But note that
the pools have already tossed all the memory associated with parsing and
compiling your script. Now you just have to worry about what gets allocated
as part of the interpretation process, and where that data might end up
getting stashed. Whereever the data goes... that determines the appropriate
lifetime. If somebody loads a new module into the interpreter, well that
probably sticks around, so it lives in the interp pool. Objects that are
instantiated are probably per-thread, while some data might be passed across
threads, so it lives in a data subpool of the interpreter.

etc.  The point is that object lifetimes *can* be well-designed, and the
pools simply mirror that structure. And also note a subtle benefit:
*because* of the pools, you think harder about lifetimes, and you organize
your code appropriately.

  ...
   Do you guys have any interest in adding this kind of smarter memory pool
   into APR? I think it's extremely useful.
 
 Sure. Although I'm a bit unclear on how it differs from using, say,
 apr_pool_destroy on a subpool to toss intermediate memory.
 
 If I understood correctly the difference is that you don't need to group 
 the memory but can allocate and toss memory when ever you need to. This 
 kind of knowing in advance can't be done in PHP.

I think it is really about granularity. pools are about grouping together
related allocations. If PHP, or its 

RE: Memory manager

2002-06-25 Thread Andi Gutmans
At 11:28 PM 6/24/2002 +0200, Sander Striker wrote:
I think some of us have an interest in implementing reaps.  However,
I'm not going to touch the pools code to get another mechanism in place
anytime soon.  I know apr_free can be added to the current code with
little trouble, keeping the costs in the free.
I didn't mean to touch the existing pool. I thought it might be interesting 
to have an additional kind of pool so that the user has some more choice.

In any case I think it would be nice to see your code ;)
http://www.php.net/~andi/zend_mm/
Andi


Re: Memory manager

2002-06-25 Thread Andi Gutmans
At 03:58 PM 6/24/2002 -0700, Greg Stein wrote:
On Mon, Jun 24, 2002 at 11:07:43PM +0300, Andi Gutmans wrote:
...
 The APR memory pools aren't good enough for us because they don't allow
 for any freeing which just doesn't work for PHP.
Um. We use pools in Subversion and free the memory all the time. The key is
the use of subpools. I added some notes about our experiences at the end of
this document:
http://cvs.apache.org/viewcvs/apr-serf/docs/roadmap.txt?rev=1.3
Note that pools can also be configured to not have a per-thread lock.
Ouch, you really worked hard there. That is exactly what we can't do in 
PHP. Our code base is so big that the easiest solution for us has always 
been to just give our users the memory allocation API they are used to (in 
our case emalloc(), efree(), erealloc() and so on) and just make sure that 
all of this memory gets freed at the end of each request (we also have some 
leak detection code but that is coded on top of the actual memory manager).
Also as PHP is a scripting language it can run for quite a bit and do lots 
of allocation's and free's. You can't really do any planning like you guys 
did in Subversion on exactly when stuff can be freed and when not. Grouping 
memory allocations is virtually impossible. Anyway, it does seem that you 
guys had to work a bit too hard.


...
 Do you guys have any interest in adding this kind of smarter memory pool
 into APR? I think it's extremely useful.
Sure. Although I'm a bit unclear on how it differs from using, say,
apr_pool_destroy on a subpool to toss intermediate memory.
If I understood correctly the difference is that you don't need to group 
the memory but can allocate and toss memory when ever you need to. This 
kind of knowing in advance can't be done in PHP.

Andi
P.S. - I'm enthusiastically waiting for subversion. CVS just doesn't cut it 
anymore.



Re: Memory manager

2002-06-25 Thread Greg Stein
On Tue, Jun 25, 2002 at 09:07:22PM +0300, Andi Gutmans wrote:
 I won't cc: the apr dev group because it'll just clutter their list :)

Heh. Well, I trimmed out some, and am taking one point to the list.

 At 03:09 AM 6/25/2002 -0700, Greg Stein wrote:
...
 Sander mentioned something about reaps. I remember that coming up a while
 back, but am not super clear on it. IIRC, it was a synthesis of pools and
 being able to individually free items. Probably something right along the
 lines of what you're looking for.
 
 If you guys feel this works better for you then great!

My point is that (IIRC) reaps are essentially what you are building. There
is some actual academic research on their use. You might find that useful
for your own work.

Further, that we would want to look at your work and compare that with the
reap information, and bring that into APR.

The post about reaps is here:

http://www.apachelabs.org/apr-mbox/200203.mbox/[EMAIL PROTECTED]

 As I mentioned I thought this memory manager could be helpful for certain 
 projects using APR. I thought it should be an additional optional pool type 
 as different apps and different developers have different needs.

You bet.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


RE: Memory manager

2002-06-24 Thread Sander Striker
Hi Andi,

 Hi,
 
 PHP uses memory allocation extensively. During the life cycle of a PHP 
 script there is a huge amount of malloc()'s and free()'s. We found that 
 under multi-threaded web servers this leads to decreased performance due to 
 memory fragmentation and locking within the memory manager.
 The solution is using per-thread memory pools which don't lock and are 
 completely freed at the end of each request.
 Win32 supports this kind of per-thread memory pool with the 
 HeapCreate(HEAP_NO_SERIALIZE, ...) family of functions. Using these kind of 
 functions gave us a huge performance gain.
 Now with Apache 2 coming out I wanted to solve this problem in a 
 cross-platform way as I don't have Bill's API available on UNIX :) The APR 
 memory pools aren't good enough for us because they don't allow for any 
 freeing which just doesn't work for PHP.
 What we did was write a memory manager (similar to Doug Lea's malloc.c but 
 much more lightweight) which allows you to have many instances (pools) and 
 it supports allocation, freeing, reallocation. At the end of each request 
 it quickly frees all of the huge memory chunks it used. I started using it 
 with the new PHP scripting engine and am allocating memory in 64KB blocks 
 (run-time definable) and it seems to work pretty well. To allocate the 
 memory blocks themselves it uses malloc() which makes it extremely 
 portable. (I actually got that idea from APR).
 
 Do you guys have any interest in adding this kind of smarter memory pool 
 into APR? I think it's extremely useful.

I think some of us have an interest in implementing reaps.  However,
I'm not going to touch the pools code to get another mechanism in place
anytime soon.  I know apr_free can be added to the current code with
little trouble, keeping the costs in the free.

In any case I think it would be nice to see your code ;)


 If you reply please cc: me because I'm not on the APR dev list.
 
 Andi

Sander



Re: Memory manager

2002-06-24 Thread Greg Stein
On Mon, Jun 24, 2002 at 11:07:43PM +0300, Andi Gutmans wrote:
...
 The APR memory pools aren't good enough for us because they don't allow
 for any freeing which just doesn't work for PHP.

Um. We use pools in Subversion and free the memory all the time. The key is
the use of subpools. I added some notes about our experiences at the end of
this document:

http://cvs.apache.org/viewcvs/apr-serf/docs/roadmap.txt?rev=1.3

Note that pools can also be configured to not have a per-thread lock.

...
 Do you guys have any interest in adding this kind of smarter memory pool 
 into APR? I think it's extremely useful.

Sure. Although I'm a bit unclear on how it differs from using, say,
apr_pool_destroy on a subpool to toss intermediate memory.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/