Re: Memory accounting in libgc

2014-03-18 Thread Noah Lavine
Hello,

This reminds me of a related but distinct issue - region-based allocation.
Hypothetically, you could use regions to implement a limited form of this
accounting, if you could ask the question, what is the size of this memory
region?

It certainly wouldn't do as much as custodians, but it would serve for the
simple purpose of a sandbox. It would also have another nice effect,
because you could allocate all of the sandbox's objects together and get
memory locality when running in the sandbox.

I don't know if this is the right solution, but I am curious what other
people think.

Best,
Noah


On Sun, Mar 9, 2014 at 6:48 AM, Andy Wingo wi...@pobox.com wrote:

 Hi,

 I was thinking this morning about memory accounting in libgc.  There are
 a couple of use cases for this.

 One is simply asking how much memory does this object use?

 Another case is the common implement an IRC bot that people can use to
 evaluate expressions in a sandbox, but with a limit on the amount of
 time and space those programs can use.  Let's assume that we've already
 arranged that the program can't write to files or do other things we
 wouldn't like it to do, and also that we have given it a time limit
 through some means.

 Both of these cases should be handled by GC, since GC is what can see
 the whole object graph in a program.  The rest of this mail describes
 what Racket does in their GC, and how we might be able to do it in
 libgc.

 Racket has a type of objects called custodians.  Custodians nest.
 There is one root custodian for the whole program, and creating a new
 custodian adds a new kid of the current custodian.  There is a separate
 form to make a custodian current.  New threads are created within the
 current custodian.  (Racket threads are not pthreads, but this concept
 generalizes to pthreads.)  There is also a form to enter a new
 custodian, effectively partitioning a thread's stack into a range owned
 by the old custodian, and a range owned by the new custodian.

 If there are multiple custodians in a program -- something that isn't
 always the case -- then after GC runs, their GC calls
 BTC_do_accounting() here:


 http://git.racket-lang.org/plt/blob/c27930f9399564497208eb31a44f7091d02ab7be:/racket/src/racket/gc2/mem_account.c#l417

 This does a mark of the program's objects, but in a different mode (the
 accounting flag is set).  Custodians and threads have a separate GC
 kind, in BDW-GC terms.  When marking a custodian or a thread in
 accounting mode, the mark procedure only recurses into sub-objects if
 the custodian is current.  First, all the program static roots are
 marked, excluding stacks, and memory use from those roots is attributed
 to the root custodian.  Then BTC_do_accounting traverses the custodian
 tree, in pre-order, for each custodian setting it current and running
 its mark procedure, and marking stack segments for any threads created
 by that custodian.  Any visited unmarked object is accounted to the
 current custodian (i.e., the size of the object is added to the current
 custodian's size counter).  Then the tree is traversed in post-order,
 and child memory use is also accounted to parents.  This is O(n) in live
 heap size and doesn't require any additional memory.

 Finally if any custodian is over its limit, an after-GC hook sets that
 custodian as scheduled for shutdown, which in Racket will close
 resources associated with the custodian (for example, files opened when
 the custodian was current), arrange to kill threads spawned by the
 custodian, arrange to longjmp() out of any entered custodian, etc.

 How does this affect libgc?

 First of all, it gives an answer to the question of how much memory
 does an object use -- simply stop the world, mark the heap in two parts
 (the first time ignoring the object in question, the second time
 starting from the object), and subtract the live heap size of the former
 from the latter.  Libgc could do this without too much problem, it seems
 to me, on objects of any kind.  It would be a little extra code but it
 could be useful.  Or not?  Dunno.

 On the question of limiting memory usage -- clearly, arranging to free
 memory is out of libgc's purview, as that's an application issue.  But
 allowing an application to impose a memory limitation on a part of the
 program does need libgc help, and it would be most usefully done via
 something like a custodian tree.  Custodians could be implemented by a
 new GC kind, so that they get a mark procedure; but then we would need a
 hook to be able to do accounting after GC is done, and really it seems
 this is best done in libgc somehow (even if its implementation is
 layered on top of gc kinds, mark procedures, and such).

 Accounting a thread's stack to multiple owners is tricky, but doable --
 we could do it with a new version of GC_call_with_gc_active or so.

 There is also the issue of objects reachable by custodians that are
 peers -- where one does not dominate the other.  In that case

Re: Memory accounting in libgc

2014-03-13 Thread Neil Jerram

On 2014-03-12 06:57, Mark H Weaver wrote:

Andy Wingo wi...@pobox.com writes:


How does this affect libgc?

First of all, it gives an answer to the question of how much memory
does an object use -- simply stop the world, mark the heap in two 
parts

(the first time ignoring the object in question, the second time
starting from the object), and subtract the live heap size of the 
former
from the latter.  Libgc could do this without too much problem, it 
seems

to me, on objects of any kind.  It would be a little extra code but it
could be useful.  Or not?  Dunno.


This could be generalized to the far more useful question: How much
memory does this set of objects use?, although that's a slippery
question that might better be formulated as How much memory would be
freed if this set of objects were no longer needed?.

For example, suppose you have a large data structure that is referenced
from two small header objects, A and B.  If you ask How much memory
does A use?, the answer will be the size of the small header, and 
ditto

for B.  Without being able to ask the more general question, there's no
way to find out how much would be freed by releasing both.

 Mark


Absolutely agree that this would be useful, but I suspect a problem in 
how far one can push libgc to simulate a set of objects being freed 
without them actually _being_ freed.  For example there could be 
guardians associated with the objects, and I think one can validly 
imagine them doing either of the possible extremes (or anywhere in 
between), namely:


- resurrecting the objects again

- freeing up a whole load more objects/memory that the guardians know to 
be associated with the original objects, but which wasn't (for some 
reason) simply referenced by them.


Hope that's a useful thought - interesting subject!

Neil




Re: Memory accounting in libgc

2014-03-12 Thread Mark H Weaver
Andy Wingo wi...@pobox.com writes:

 How does this affect libgc?

 First of all, it gives an answer to the question of how much memory
 does an object use -- simply stop the world, mark the heap in two parts
 (the first time ignoring the object in question, the second time
 starting from the object), and subtract the live heap size of the former
 from the latter.  Libgc could do this without too much problem, it seems
 to me, on objects of any kind.  It would be a little extra code but it
 could be useful.  Or not?  Dunno.

This could be generalized to the far more useful question: How much
memory does this set of objects use?, although that's a slippery
question that might better be formulated as How much memory would be
freed if this set of objects were no longer needed?.

For example, suppose you have a large data structure that is referenced
from two small header objects, A and B.  If you ask How much memory
does A use?, the answer will be the size of the small header, and ditto
for B.  Without being able to ask the more general question, there's no
way to find out how much would be freed by releasing both.

 Mark



Re: Memory accounting in libgc

2014-03-12 Thread Andrew Gaylard

On 03/12/14 08:57, Mark H Weaver wrote:

Andy Wingo wi...@pobox.com writes:


How does this affect libgc?

First of all, it gives an answer to the question of how much memory
does an object use -- simply stop the world, mark the heap in two parts
(the first time ignoring the object in question, the second time
starting from the object), and subtract the live heap size of the former
from the latter.  Libgc could do this without too much problem, it seems
to me, on objects of any kind.  It would be a little extra code but it
could be useful.  Or not?  Dunno.

This could be generalized to the far more useful question: How much
memory does this set of objects use?, although that's a slippery
question that might better be formulated as How much memory would be
freed if this set of objects were no longer needed?.

For example, suppose you have a large data structure that is referenced
from two small header objects, A and B.  If you ask How much memory
does A use?, the answer will be the size of the small header, and ditto
for B.  Without being able to ask the more general question, there's no
way to find out how much would be freed by releasing both.

  Mark

Agreed.  In order to build industrial-strength applications in guile, it's
important to be able to answer questions such as what is causing my
process' memory usage to grow?

--
Andrew Gaylard




Memory accounting in libgc

2014-03-09 Thread Andy Wingo
Hi,

I was thinking this morning about memory accounting in libgc.  There are
a couple of use cases for this.

One is simply asking how much memory does this object use?

Another case is the common implement an IRC bot that people can use to
evaluate expressions in a sandbox, but with a limit on the amount of
time and space those programs can use.  Let's assume that we've already
arranged that the program can't write to files or do other things we
wouldn't like it to do, and also that we have given it a time limit
through some means.

Both of these cases should be handled by GC, since GC is what can see
the whole object graph in a program.  The rest of this mail describes
what Racket does in their GC, and how we might be able to do it in
libgc.

Racket has a type of objects called custodians.  Custodians nest.
There is one root custodian for the whole program, and creating a new
custodian adds a new kid of the current custodian.  There is a separate
form to make a custodian current.  New threads are created within the
current custodian.  (Racket threads are not pthreads, but this concept
generalizes to pthreads.)  There is also a form to enter a new
custodian, effectively partitioning a thread's stack into a range owned
by the old custodian, and a range owned by the new custodian.

If there are multiple custodians in a program -- something that isn't
always the case -- then after GC runs, their GC calls
BTC_do_accounting() here:

  
http://git.racket-lang.org/plt/blob/c27930f9399564497208eb31a44f7091d02ab7be:/racket/src/racket/gc2/mem_account.c#l417

This does a mark of the program's objects, but in a different mode (the
accounting flag is set).  Custodians and threads have a separate GC
kind, in BDW-GC terms.  When marking a custodian or a thread in
accounting mode, the mark procedure only recurses into sub-objects if
the custodian is current.  First, all the program static roots are
marked, excluding stacks, and memory use from those roots is attributed
to the root custodian.  Then BTC_do_accounting traverses the custodian
tree, in pre-order, for each custodian setting it current and running
its mark procedure, and marking stack segments for any threads created
by that custodian.  Any visited unmarked object is accounted to the
current custodian (i.e., the size of the object is added to the current
custodian's size counter).  Then the tree is traversed in post-order,
and child memory use is also accounted to parents.  This is O(n) in live
heap size and doesn't require any additional memory.

Finally if any custodian is over its limit, an after-GC hook sets that
custodian as scheduled for shutdown, which in Racket will close
resources associated with the custodian (for example, files opened when
the custodian was current), arrange to kill threads spawned by the
custodian, arrange to longjmp() out of any entered custodian, etc.

How does this affect libgc?

First of all, it gives an answer to the question of how much memory
does an object use -- simply stop the world, mark the heap in two parts
(the first time ignoring the object in question, the second time
starting from the object), and subtract the live heap size of the former
from the latter.  Libgc could do this without too much problem, it seems
to me, on objects of any kind.  It would be a little extra code but it
could be useful.  Or not?  Dunno.

On the question of limiting memory usage -- clearly, arranging to free
memory is out of libgc's purview, as that's an application issue.  But
allowing an application to impose a memory limitation on a part of the
program does need libgc help, and it would be most usefully done via
something like a custodian tree.  Custodians could be implemented by a
new GC kind, so that they get a mark procedure; but then we would need a
hook to be able to do accounting after GC is done, and really it seems
this is best done in libgc somehow (even if its implementation is
layered on top of gc kinds, mark procedures, and such).

Accounting a thread's stack to multiple owners is tricky, but doable --
we could do it with a new version of GC_call_with_gc_active or so.

There is also the issue of objects reachable by custodians that are
peers -- where one does not dominate the other.  In that case the
object would be charged randomly to one or the other.  That's probably
OK.

Also, you would probably want to maintain a general set of per-custodian
data -- file descriptors, for example.  One would also want to maintain
an estimate of non-GC allocations per-custodian (for example for
third-party image processing libraries), but I don't know how that could
be done; perhaps it is only possible if you register a separate GC kind
for those objects, perhaps with a mark procedure.

Anyway, just some thoughts.  I don't know when I could implement this,
but I thought I'd put this out there in case someone had any ideas about
this, or pointers to literature.  Cheers.

Regards,

Andy
-- 
http://wingolog.org/