Re: Memory accounting in libgc
Hello, This reminds me of a related but distinct issue - region-based allocation. Hypothetically, you could use regions to implement a limited form of this accounting, if you could ask the question, what is the size of this memory region? It certainly wouldn't do as much as custodians, but it would serve for the simple purpose of a sandbox. It would also have another nice effect, because you could allocate all of the sandbox's objects together and get memory locality when running in the sandbox. I don't know if this is the right solution, but I am curious what other people think. Best, Noah On Sun, Mar 9, 2014 at 6:48 AM, Andy Wingo wi...@pobox.com wrote: Hi, I was thinking this morning about memory accounting in libgc. There are a couple of use cases for this. One is simply asking how much memory does this object use? Another case is the common implement an IRC bot that people can use to evaluate expressions in a sandbox, but with a limit on the amount of time and space those programs can use. Let's assume that we've already arranged that the program can't write to files or do other things we wouldn't like it to do, and also that we have given it a time limit through some means. Both of these cases should be handled by GC, since GC is what can see the whole object graph in a program. The rest of this mail describes what Racket does in their GC, and how we might be able to do it in libgc. Racket has a type of objects called custodians. Custodians nest. There is one root custodian for the whole program, and creating a new custodian adds a new kid of the current custodian. There is a separate form to make a custodian current. New threads are created within the current custodian. (Racket threads are not pthreads, but this concept generalizes to pthreads.) There is also a form to enter a new custodian, effectively partitioning a thread's stack into a range owned by the old custodian, and a range owned by the new custodian. If there are multiple custodians in a program -- something that isn't always the case -- then after GC runs, their GC calls BTC_do_accounting() here: http://git.racket-lang.org/plt/blob/c27930f9399564497208eb31a44f7091d02ab7be:/racket/src/racket/gc2/mem_account.c#l417 This does a mark of the program's objects, but in a different mode (the accounting flag is set). Custodians and threads have a separate GC kind, in BDW-GC terms. When marking a custodian or a thread in accounting mode, the mark procedure only recurses into sub-objects if the custodian is current. First, all the program static roots are marked, excluding stacks, and memory use from those roots is attributed to the root custodian. Then BTC_do_accounting traverses the custodian tree, in pre-order, for each custodian setting it current and running its mark procedure, and marking stack segments for any threads created by that custodian. Any visited unmarked object is accounted to the current custodian (i.e., the size of the object is added to the current custodian's size counter). Then the tree is traversed in post-order, and child memory use is also accounted to parents. This is O(n) in live heap size and doesn't require any additional memory. Finally if any custodian is over its limit, an after-GC hook sets that custodian as scheduled for shutdown, which in Racket will close resources associated with the custodian (for example, files opened when the custodian was current), arrange to kill threads spawned by the custodian, arrange to longjmp() out of any entered custodian, etc. How does this affect libgc? First of all, it gives an answer to the question of how much memory does an object use -- simply stop the world, mark the heap in two parts (the first time ignoring the object in question, the second time starting from the object), and subtract the live heap size of the former from the latter. Libgc could do this without too much problem, it seems to me, on objects of any kind. It would be a little extra code but it could be useful. Or not? Dunno. On the question of limiting memory usage -- clearly, arranging to free memory is out of libgc's purview, as that's an application issue. But allowing an application to impose a memory limitation on a part of the program does need libgc help, and it would be most usefully done via something like a custodian tree. Custodians could be implemented by a new GC kind, so that they get a mark procedure; but then we would need a hook to be able to do accounting after GC is done, and really it seems this is best done in libgc somehow (even if its implementation is layered on top of gc kinds, mark procedures, and such). Accounting a thread's stack to multiple owners is tricky, but doable -- we could do it with a new version of GC_call_with_gc_active or so. There is also the issue of objects reachable by custodians that are peers -- where one does not dominate the other. In that case
Re: Memory accounting in libgc
On 2014-03-12 06:57, Mark H Weaver wrote: Andy Wingo wi...@pobox.com writes: How does this affect libgc? First of all, it gives an answer to the question of how much memory does an object use -- simply stop the world, mark the heap in two parts (the first time ignoring the object in question, the second time starting from the object), and subtract the live heap size of the former from the latter. Libgc could do this without too much problem, it seems to me, on objects of any kind. It would be a little extra code but it could be useful. Or not? Dunno. This could be generalized to the far more useful question: How much memory does this set of objects use?, although that's a slippery question that might better be formulated as How much memory would be freed if this set of objects were no longer needed?. For example, suppose you have a large data structure that is referenced from two small header objects, A and B. If you ask How much memory does A use?, the answer will be the size of the small header, and ditto for B. Without being able to ask the more general question, there's no way to find out how much would be freed by releasing both. Mark Absolutely agree that this would be useful, but I suspect a problem in how far one can push libgc to simulate a set of objects being freed without them actually _being_ freed. For example there could be guardians associated with the objects, and I think one can validly imagine them doing either of the possible extremes (or anywhere in between), namely: - resurrecting the objects again - freeing up a whole load more objects/memory that the guardians know to be associated with the original objects, but which wasn't (for some reason) simply referenced by them. Hope that's a useful thought - interesting subject! Neil
Re: Memory accounting in libgc
Andy Wingo wi...@pobox.com writes: How does this affect libgc? First of all, it gives an answer to the question of how much memory does an object use -- simply stop the world, mark the heap in two parts (the first time ignoring the object in question, the second time starting from the object), and subtract the live heap size of the former from the latter. Libgc could do this without too much problem, it seems to me, on objects of any kind. It would be a little extra code but it could be useful. Or not? Dunno. This could be generalized to the far more useful question: How much memory does this set of objects use?, although that's a slippery question that might better be formulated as How much memory would be freed if this set of objects were no longer needed?. For example, suppose you have a large data structure that is referenced from two small header objects, A and B. If you ask How much memory does A use?, the answer will be the size of the small header, and ditto for B. Without being able to ask the more general question, there's no way to find out how much would be freed by releasing both. Mark
Re: Memory accounting in libgc
On 03/12/14 08:57, Mark H Weaver wrote: Andy Wingo wi...@pobox.com writes: How does this affect libgc? First of all, it gives an answer to the question of how much memory does an object use -- simply stop the world, mark the heap in two parts (the first time ignoring the object in question, the second time starting from the object), and subtract the live heap size of the former from the latter. Libgc could do this without too much problem, it seems to me, on objects of any kind. It would be a little extra code but it could be useful. Or not? Dunno. This could be generalized to the far more useful question: How much memory does this set of objects use?, although that's a slippery question that might better be formulated as How much memory would be freed if this set of objects were no longer needed?. For example, suppose you have a large data structure that is referenced from two small header objects, A and B. If you ask How much memory does A use?, the answer will be the size of the small header, and ditto for B. Without being able to ask the more general question, there's no way to find out how much would be freed by releasing both. Mark Agreed. In order to build industrial-strength applications in guile, it's important to be able to answer questions such as what is causing my process' memory usage to grow? -- Andrew Gaylard
Memory accounting in libgc
Hi, I was thinking this morning about memory accounting in libgc. There are a couple of use cases for this. One is simply asking how much memory does this object use? Another case is the common implement an IRC bot that people can use to evaluate expressions in a sandbox, but with a limit on the amount of time and space those programs can use. Let's assume that we've already arranged that the program can't write to files or do other things we wouldn't like it to do, and also that we have given it a time limit through some means. Both of these cases should be handled by GC, since GC is what can see the whole object graph in a program. The rest of this mail describes what Racket does in their GC, and how we might be able to do it in libgc. Racket has a type of objects called custodians. Custodians nest. There is one root custodian for the whole program, and creating a new custodian adds a new kid of the current custodian. There is a separate form to make a custodian current. New threads are created within the current custodian. (Racket threads are not pthreads, but this concept generalizes to pthreads.) There is also a form to enter a new custodian, effectively partitioning a thread's stack into a range owned by the old custodian, and a range owned by the new custodian. If there are multiple custodians in a program -- something that isn't always the case -- then after GC runs, their GC calls BTC_do_accounting() here: http://git.racket-lang.org/plt/blob/c27930f9399564497208eb31a44f7091d02ab7be:/racket/src/racket/gc2/mem_account.c#l417 This does a mark of the program's objects, but in a different mode (the accounting flag is set). Custodians and threads have a separate GC kind, in BDW-GC terms. When marking a custodian or a thread in accounting mode, the mark procedure only recurses into sub-objects if the custodian is current. First, all the program static roots are marked, excluding stacks, and memory use from those roots is attributed to the root custodian. Then BTC_do_accounting traverses the custodian tree, in pre-order, for each custodian setting it current and running its mark procedure, and marking stack segments for any threads created by that custodian. Any visited unmarked object is accounted to the current custodian (i.e., the size of the object is added to the current custodian's size counter). Then the tree is traversed in post-order, and child memory use is also accounted to parents. This is O(n) in live heap size and doesn't require any additional memory. Finally if any custodian is over its limit, an after-GC hook sets that custodian as scheduled for shutdown, which in Racket will close resources associated with the custodian (for example, files opened when the custodian was current), arrange to kill threads spawned by the custodian, arrange to longjmp() out of any entered custodian, etc. How does this affect libgc? First of all, it gives an answer to the question of how much memory does an object use -- simply stop the world, mark the heap in two parts (the first time ignoring the object in question, the second time starting from the object), and subtract the live heap size of the former from the latter. Libgc could do this without too much problem, it seems to me, on objects of any kind. It would be a little extra code but it could be useful. Or not? Dunno. On the question of limiting memory usage -- clearly, arranging to free memory is out of libgc's purview, as that's an application issue. But allowing an application to impose a memory limitation on a part of the program does need libgc help, and it would be most usefully done via something like a custodian tree. Custodians could be implemented by a new GC kind, so that they get a mark procedure; but then we would need a hook to be able to do accounting after GC is done, and really it seems this is best done in libgc somehow (even if its implementation is layered on top of gc kinds, mark procedures, and such). Accounting a thread's stack to multiple owners is tricky, but doable -- we could do it with a new version of GC_call_with_gc_active or so. There is also the issue of objects reachable by custodians that are peers -- where one does not dominate the other. In that case the object would be charged randomly to one or the other. That's probably OK. Also, you would probably want to maintain a general set of per-custodian data -- file descriptors, for example. One would also want to maintain an estimate of non-GC allocations per-custodian (for example for third-party image processing libraries), but I don't know how that could be done; perhaps it is only possible if you register a separate GC kind for those objects, perhaps with a mark procedure. Anyway, just some thoughts. I don't know when I could implement this, but I thought I'd put this out there in case someone had any ideas about this, or pointers to literature. Cheers. Regards, Andy -- http://wingolog.org/