Re: [Python-Dev] GC Changes
On 10/3/07, Greg Ewing [EMAIL PROTECTED] wrote: Martin v. Löwis wrote: For stack frames, such a registration is difficult to make efficient. Also very error-prone if you happen to miss one. Although maybe no more error-prone than getting the reference counting right. Maybe, but reference counting is really easy to debug if you screw it up. This is probably one of the primary benefits of the majority of memory management being executed in reference counting - it's deterministic and easy to debug. I'm not opposed to memory management being done entirely through garbage collection, but it would have to be vastly superior to the current system in both memory efficiency and performance. -- Nick ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
To further elaborate, the main obstacle is with extension modules. Most of them create roots and there is no defined API for the Python interpreter to find them. That is a problem, but furthermore, I feel that local variables stored in stack frames of threads are even more difficult to integrate. For the extension-module globals, some sort of registration could be done (and PEP 3121 provides an infrastructure for that). For stack frames, such a registration is difficult to make efficient. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
Martin v. Löwis wrote: For stack frames, such a registration is difficult to make efficient. Also very error-prone if you happen to miss one. Although maybe no more error-prone than getting the reference counting right. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
On 02/10/2007, Adam Olsen [EMAIL PROTECTED] wrote: On 10/1/07, Greg Ewing [EMAIL PROTECTED] wrote: Justin Tulloss wrote: Would somebody care to give me a brief overview on how the current gc module interacts with the interpreter The cyclic GC kicks in when memory is running low. Since This isn't true at all. It's triggered by heuristics based on the total number of allocated objects. It doesn't know how much memory is available and is not called if an allocation fails. Correct. And that reminds me of the limitation of the the Python GC: it doesn't take into account how much memory is being indirectly retained by a Python Object. Like in the example I already gave, gtk.gdk.Pixbuf can easily hold hundreds of megabytes, yet the GC gives it as much consideration as it does to a simple python integer object which is several orders of magnitude smaller. If someone wanted to improve the GC, that person should consider adding a protocol for the GC to retrieve the ammount of memory indirectly held by a python object, and take such memory into consideration in its heuristics. -- Gustavo J. A. M. Carneiro INESC Porto, Telecommunications and Multimedia Unit The universe is always one step beyond logic. -- Frank Herbert ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
On Tue, 2007-10-02 at 10:50 +0100, Gustavo Carneiro wrote: Correct. And that reminds me of the limitation of the the Python GC: it doesn't take into account how much memory is being indirectly retained by a Python Object. Like in the example I already gave, gtk.gdk.Pixbuf can easily hold hundreds of megabytes, yet the GC gives it as much consideration as it does to a simple python integer object which is several orders of magnitude smaller. That sounds like a case for the Pixbuf object to have a close method (not necessarily called that) that releases the resources. The point of GC is that you normally don't care if memory is released sooner or later; for stuff you do care about, such as files, shared memory, or even large memory objects, there is always explicit management. cStringIO's close method provides a precedent. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
On 02/10/2007, Hrvoje Nikšić [EMAIL PROTECTED] wrote: On Tue, 2007-10-02 at 10:50 +0100, Gustavo Carneiro wrote: Correct. And that reminds me of the limitation of the the Python GC: it doesn't take into account how much memory is being indirectly retained by a Python Object. Like in the example I already gave, gtk.gdk.Pixbuf can easily hold hundreds of megabytes, yet the GC gives it as much consideration as it does to a simple python integer object which is several orders of magnitude smaller. That sounds like a case for the Pixbuf object to have a close method (not necessarily called that) that releases the resources. The point of GC is that you normally don't care if memory is released sooner or later; for stuff you do care about, such as files, shared memory, or even large memory objects, there is always explicit management. cStringIO's close method provides a precedent. I think close in real files is needed not so much because you want to free memory, but that you want to prevent data loss by flushing the file buffer into the actual file at a certain point in time. And I suspect that cStringIO just added a close() method for compatibility with real files. But of course I speculating here... -- Gustavo J. A. M. Carneiro INESC Porto, Telecommunications and Multimedia Unit The universe is always one step beyond logic. -- Frank Herbert ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
On Tue, 2007-10-02 at 11:30 +0100, Gustavo Carneiro wrote: even large memory objects, there is always explicit management. cStringIO's close method provides a precedent. I think close in real files is needed not so much because you want to free memory, but that you want to prevent data loss by flushing the file buffer into the actual file at a certain point in time. You can also do that with flush. What is specific for close is that it frees up the system resources occupied by the open file. Calling it at a known point in time ensures that these external resources aren't occupied any longer than necessary, regardless of the object deallocator's policies. That is why open(filename).read() is discouraged, despite being a useful idiom in throwaway scripts and working just fine in CPython. The fact that programmers need Pixbuf's memory released sooner rather than later might indicate that Pixbuf would benefit from a close-like method. In any case, I trust your judgment on Pixbufs, but merely point out that in general it is unwise to rely on the deallocator to release resources where timing of the release matters. And I suspect that cStringIO just added a close() method for compatibility with real files. But of course I speculating here... That could be the case, but then the method could also be a no-op. I only brought up cStringIO as a precedent for a close method that works with a memory-only object. (StringIO's close also tries to arrange for the buffer to be freed immediately, inasmuch as that is possible to achieve in pure Python.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
On 10/2/07, Hrvoje Nikšić [EMAIL PROTECTED] wrote: On Tue, 2007-10-02 at 10:50 +0100, Gustavo Carneiro wrote: Correct. And that reminds me of the limitation of the the Python GC: it doesn't take into account how much memory is being indirectly retained by a Python Object. Like in the example I already gave, gtk.gdk.Pixbuf can easily hold hundreds of megabytes, yet the GC gives it as much consideration as it does to a simple python integer object which is several orders of magnitude smaller. That sounds like a case for the Pixbuf object to have a close method (not necessarily called that) that releases the resources. The point of GC is that you normally don't care if memory is released sooner or later; for stuff you do care about, such as files, shared memory, or even large memory objects, there is always explicit management. cStringIO's close method provides a precedent. Agreed, objects that consume a lot of resources of any type should have a method to clean themselves up or the language should take special notice of 'del myObject' and call the destructor immediately if nothing else refers to the object at that point rather than leaving it up to a GC. Its just good programming practice to do such things on large objects regardless of the underlying allocation implementation. -gps ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
Jeremy covered already most of it, so I'll only address specific points: and I think that the current gc module is of the mark-and-sweep variety. That is incorrect. It's two phases, but in the first phase, it isn't mark, but count, and in the second phase, it's not sweep but break. To count means to count the number of references to an object, and to break means to break the cycles at selected places, expecting that reference counting will actually cause objects to become released. Is the trend going to be to move away from reference counting and towards the mark-and-sweep implementation that currently exists, or is reference counting a firmly ingrained tradition? For the CPython implementation, there is no trend. The language spec explicitly states that automatic memory management is unspecified and implementation-defined, yet people rely on reference counting in many programs. That is not the reason to kick it out, though - it stays around because a true mark-and-sweep algorithm cannot be implemented on top of the current object API (as Jeremy explains, there is no notion of root objects in the VM). On a more immediately relevant note, I'm not certain I understand the full extent of the gc module. From what I've read, it sounds like it's fairly close to a fully functional GC, yet it seems to exist only as a cycle-detecting backup to the reference counting mechanism. Would somebody care to give me a brief overview on how the current gc module interacts with the interpreter, or point me to a place where that is done? What is this that is done? The collector? I think you found Modules/gcmodule.c already. Take particular notice of the tp_traverse and tp_clear type methods. Why isn't the mark-and-sweep mechanism used for all memory management? See above - it's not implementable, because the root objects get not tracked. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
Martin v. Löwis [EMAIL PROTECTED] wrote: Why isn't the mark-and-sweep mechanism used for all memory management? See above - it's not implementable, because the root objects get not tracked. To further elaborate, the main obstacle is with extension modules. Most of them create roots and there is no defined API for the Python interpreter to find them. Perhaps a more serious problem is that CPython and 3rd party extension modules rely heavily on the fact that the GC does not move objects. There are GC strategies that perform well when faced with high object allocation rates but, AFAIK, all of them rely on moving objects. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
Hrvoje Nikšić wrote: That sounds like a case for the Pixbuf object to have a close method (not necessarily called that) that releases the resources. The point of GC is that you normally don't care if memory is released sooner or later; I think the problem here is that the GC's lack of knowledge about how much memory is being used means that you need to care more than you should have to. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] GC Changes
Hello, I've been doing some tests on removing the GIL, and it's becoming clear that some basic changes to the garbage collector may be needed in order for this to happen efficiently. Reference counting as it stands today is not very scalable. I've been looking into a few options, and I'm leaning towards the implementing IBMs recycler GC (http://www.research.ibm.com/people/d/dfb/recycler-publications.html ) since it is very similar to what is in place now from the users' perspective. However, I haven't been around the list long enough to really understand the feeling in the community on GC in the future of the interpreter. It seems that a full GC might have a lot of benefits in terms of performance and scalability, and I think that the current gc module is of the mark-and-sweep variety. Is the trend going to be to move away from reference counting and towards the mark-and-sweep implementation that currently exists, or is reference counting a firmly ingrained tradition? On a more immediately relevant note, I'm not certain I understand the full extent of the gc module. From what I've read, it sounds like it's fairly close to a fully functional GC, yet it seems to exist only as a cycle-detecting backup to the reference counting mechanism. Would somebody care to give me a brief overview on how the current gc module interacts with the interpreter, or point me to a place where that is done? Why isn't the mark-and-sweep mechanism used for all memory management? Thanks a lot! Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
On 01/10/2007, Justin Tulloss [EMAIL PROTECTED] wrote: Hello, I've been doing some tests on removing the GIL, and it's becoming clear that some basic changes to the garbage collector may be needed in order for this to happen efficiently. Reference counting as it stands today is not very scalable. I've been looking into a few options, and I'm leaning towards the implementing IBMs recycler GC (http://www.research.ibm.com/people/d/dfb/recycler-publications.html ) since it is very similar to what is in place now from the users' perspective. However, I haven't been around the list long enough to really understand the feeling in the community on GC in the future of the interpreter. It seems that a full GC might have a lot of benefits in terms of performance and scalability, and I think that the current gc module is of the mark-and-sweep variety. Is the trend going to be to move away from reference counting and towards the mark-and-sweep implementation that currently exists, or is reference counting a firmly ingrained tradition? On a more immediately relevant note, I'm not certain I understand the full extent of the gc module. From what I've read, it sounds like it's fairly close to a fully functional GC, yet it seems to exist only as a cycle-detecting backup to the reference counting mechanism. Would somebody care to give me a brief overview on how the current gc module interacts with the interpreter, or point me to a place where that is done? Why isn't the mark-and-sweep mechanism used for all memory management? The cyclic GC is just too slow to react and makes programmers mad. For instance, in PyGtk we had a traditional problem with gtk.gdk.Pixbuf, which is basically an object that wraps a raw RGB image. When users deleted such an object, which could sometimes comprise tens or hundreds of megabytes, the memory was not relased until much much later. That kind of code ended up having to manually call gc.collect() to fix what was perceived by most programmers as a memory leak, which kind of defeats the purpose of a garbage collector. This happened because PyGtk used to rely on the cyclic GC doing its work. Thankfully we moved away from that and now simple reference counting can free a Pixbuf in most cases. The cyclic GC is a very useful system, but it should only be used in addition to, not instead of, reference counting. At least that's my personal opinion... -- Gustavo J. A. M. Carneiro INESC Porto, Telecommunications and Multimedia Unit The universe is always one step beyond logic. -- Frank Herbert ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
On 10/1/07, Justin Tulloss [EMAIL PROTECTED] wrote: Hello, I've been doing some tests on removing the GIL, and it's becoming clear that some basic changes to the garbage collector may be needed in order for this to happen efficiently. Reference counting as it stands today is not very scalable. I've been looking into a few options, and I'm leaning towards the implementing IBMs recycler GC ( http://www.research.ibm.com/people/d/dfb/recycler-publications.html ) since it is very similar to what is in place now from the users' perspective. However, I haven't been around the list long enough to really understand the feeling in the community on GC in the future of the interpreter. It seems that a full GC might have a lot of benefits in terms of performance and scalability, and I think that the current gc module is of the mark-and-sweep variety. Is the trend going to be to move away from reference counting and towards the mark-and-sweep implementation that currently exists, or is reference counting a firmly ingrained tradition? On a more immediately relevant note, I'm not certain I understand the full extent of the gc module. From what I've read, it sounds like it's fairly close to a fully functional GC, yet it seems to exist only as a cycle-detecting backup to the reference counting mechanism. Would somebody care to give me a brief overview on how the current gc module interacts with the interpreter, or point me to a place where that is done? Why isn't the mark-and-sweep mechanism used for all memory management? There are probably lots of interesting things to say about the gc cycle collector, but I'll just pick a few things that seem relevant. First off, the gc doesn't have a root set. It traces all the container objects, subtracting known references from the refcount, and is left with a root set, i.e. those objects that have some references that can't be accounted for among the known container objects. (see update_refs and substract_refs) In the end, we make three traversals of the heap to detect the objects that appear to be unreachable. (move_unreachable is the third.) The cycle detection relies on having the reference counts correct, so it doesn't really represent a move away from refcounting. I skipped the generations. The GC divides the heap into three generations and tends to focus on the youngest generation. So that limits the portion of the heap that is scanned, but I don't understand the magnitude of that effect in practice. The current collector works in collaboration with ref counting. In particular, refcounting probably handles the majority of deallocations. If the cycle detection system were responsible for all deallocations, the gc module would have a lot more work to do. I do think it would be interesting to experiment with the recycler approach, but I think it would take a lot of work to do a decent experiment. But please give it a shot! Jeremy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
Justin Tulloss wrote: Is the trend going to be to move away from reference counting and towards the mark-and-sweep implementation that currently exists, or is reference counting a firmly ingrained tradition? It's hard to predict the future, but the general feeling I get is that many people would like to keep the current reference counting behaviour, because it has a number of nice properties: * It's cache-friendly - it doesn't periodically go rampaging through large chunks of memory like mark-and-sweep tends to do. * It tends to reclaim things very promptly. This is important in a language like Python that uses dynamic allocation extremely heavily, even for trivial things like integers. It also helps with cacheing. * It's easy to make it interoperate with external libraries that have their own ways of managing memory. You don't have to take special steps to protect things from the garbage collector. Would somebody care to give me a brief overview on how the current gc module interacts with the interpreter The cyclic GC kicks in when memory is running low. Since the reference counting takes care of any data structures that don't contain cycles, the GC only has to deal with cycles. It goes through all currently allocated objects trying to find sets whose reference counts are all accounted for by references within the set. Such a set must constitute a cycle that is not referenced anywhere from outside. It then picks an arbitrary object from the set and decrements the reference counts of all the objects it references. This breaks the cycle, and allows the reference counting mechanism to reclaim the memory. Although the cyclic GC requires passing over large chunks of memory like mark-and-sweep, it happens far less frequently than would happen if mark-and-sweep were used for all memory management. Also, the programmer can minimise the need for it by manually breaking cycles where they are known to occur. -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
The cyclic GC kicks in when memory is running low. When what memory is running low? Its default pool? System memory? Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
On 10/1/07, Greg Ewing [EMAIL PROTECTED] wrote: Justin Tulloss wrote: Would somebody care to give me a brief overview on how the current gc module interacts with the interpreter The cyclic GC kicks in when memory is running low. Since This isn't true at all. It's triggered by heuristics based on the total number of allocated objects. It doesn't know how much memory is available and is not called if an allocation fails. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
Justin Tulloss wrote: When what memory is running low? Its default pool? System memory? I'm not sure of the details, but I think it keeps a high-water mark of the amount of memory allocated for Python objects so far. When that is reached, it tries to free up memory by cyclic GC, and only mallocs more if that fails. I think it also counts the number of allocations made since the last GC and does a GC when it gets up to some threshold, so that things get cleaned out periodically and the processing is spread out somewhat. -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
Adam Olsen wrote: This isn't true at all. It's triggered by heuristics based on the total number of allocated objects. Hmmm, all right, it seems I don't know what I'm talking about. I'll shut up now before I spread any more misinformation. Sorry. -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
On 10/1/07, Greg Ewing [EMAIL PROTECTED] wrote: Adam Olsen wrote: This isn't true at all. It's triggered by heuristics based on the total number of allocated objects. Hmmm, all right, it seems I don't know what I'm talking about. I'll shut up now before I spread any more misinformation. Sorry. Hey, no worries. I half expect someone to correct me. ;) -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] GC Changes
[xposted to python-ideas, reply-to python-ideas, leaving python-dev in to correct misinformation] On Tue, Oct 02, 2007, Greg Ewing wrote: The cyclic GC kicks in when memory is running low. Not at all. The sole and only basis for GC is number of allocations compared to number of de-allocations. See http://docs.python.org/lib/module-gc.html -- Aahz ([EMAIL PROTECTED]) * http://www.pythoncraft.com/ The best way to get information on Usenet is not to ask a question, but to post the wrong information. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com