Re: Queue cleanup
Dennis Lee Bieber wrote: greg.ew...@canterbury.ac.nz declaimed the following So maybe we need to redesign the hardware. Remember the iAPX-432? http://en.wikipedia.org/wiki/Intel_iAPX_432#Garbage_collection Not quite what I had in mind. That sounds like a conventional GC algorithm that happens to be implemented in microcode. I'm thinking about ways of designing the memory itself to help with GC. Instead of putting all the smarts in the CPU, move some of them into the RAM. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On 9/8/2010 6:20 PM, Paul Rubin wrote: What tricks on tricks? Even the fanciest GC's are orders of magnitude less complicated than any serious database, optimizing compiler, OS kernel, file system, etc. Real-world software is complicated. Get used to that fact, and look for ways to manage the complexity and keep complex programs safe. Choosing to program with unsafe methods because you wish the complexity didn't exist is just deluded. Garbage collectors are difficult from a theoretical standpoint, and it's very difficult to make a correct concurrent garbage collector without using formal methods. But garbage collectors are not large pieces of code. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In message 7xeid9gtuq@ruckus.brouhaha.com, Paul Rubin wrote: Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: That reinforces my point, about how easy it was to check the correctness of the code. In this case one simple fix, like this ... would render the code watertight. See how easy it is? Well, no, it's irrelevant how easy it is to fix the issue after it's pointed out. In that case, I can similarly discount your attempts to fix up issues with garbage collectors after they’re pointed out, can I not? Part of the problem is C itself. And yet, what are these complicated garbage collectors, that you intend relying on to work correctly with all their layers of tricks upon tricks, written in? C. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: In that case, I can similarly discount your attempts to fix up issues with garbage collectors after they’re pointed out, can I not? Adapting GC algorithms to improve their performance or better use the capabilities of new hardware is much different than saying they didn't work in the first place. They've been around for 5 decades, they (like Python programs) work fine if you don't mind the performance cost, and for many applications that cost is acceptable even for quite simple and naive GC's. The continued work is to get that cost down even further. (And before criticizing GC performance, Have you ever profiled CPython to see how much time it's spending messing with reference counts? I didn't think so). Part of the problem is C itself. And yet, what are these complicated garbage collectors, that you intend relying on to work correctly with all their layers of tricks upon tricks, written in? C. What tricks on tricks? Even the fanciest GC's are orders of magnitude less complicated than any serious database, optimizing compiler, OS kernel, file system, etc. Real-world software is complicated. Get used to that fact, and look for ways to manage the complexity and keep complex programs safe. Choosing to program with unsafe methods because you wish the complexity didn't exist is just deluded. It actually seems pretty crazy to me to write a garbage collector in C today even though it a GC needs unsafe pointer operations in a few places. C made some sense in the 1980's when computers were smaller and people didn't know better. A lot of the C code around today is legacy code from that era. These days it makes more sense to use a safe language with a few isolated jailbreaks (like an OS kernel that has a few lines of assembly code) than to write the whole thing in a language whose unsafety is everywhere. Here's a good paper by R. Fateman (criticizing C) that I just came across: http://www.franz.com/resources/educational_resources/white_papers/fault.prevention.pdf He suggests using Lisp instead, but one can't have everything ;-). FWIW, here are a couple pages about verifying GC's: http://flint.cs.yale.edu/flint/publications/hgc.html http://www.cs.technion.ac.il/~erez/Papers/verified-gc-popl09.pdf etc. I just don't understand that weird agenda you seem to have. But whatever. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On Thu, 09 Sep 2010 12:41:20 +1200, Lawrence D'Oliveiro wrote: Part of the problem is C itself. And yet, what are these complicated garbage collectors, that you intend relying on to work correctly with all their layers of tricks upon tricks, written in? C. Not necessarily. Pascal, despite the contempt it is held in by university computer science departments, isn't quite dead, and some Pascal compilers use garbage collectors written in Pascal. FreePascal, I believe, is one of them. Likewise for other not-dead-yet low-level languages like Ada and Forth. As surprising as it seems to many, C is not the only low-level language around suitable for writing high-quality, efficient code. Just ask the Lisp community, which is thriving. For some definition of thriving. Admittedly C has far more attention to it than the others, so [insert weasel words here] the best C compilers tend to produce more efficient code than the best of the others, but Pascal, Ada and similar give you more security than C. I believe that when computer scientists of the future look back at the last few decades, they will judge that on balance C did more harm than good. Not that C is the only language that people can write buggy or insecure code, but C does tend to give the bugs so much help... :) -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: But you’ll notice that Free Software comes with no such restrictions. In fact, it is contrary to commonly-accepted Free Software guidelines to impose any sort of restrictions on areas of use. Free software comes with an explicit disclaimer of liability (you get what you pay for). The Sun stuff ($) may have either an express or implied warranty that could mean they get hit up for damages if the software is wrong. IANAL YMMV etc. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Paul Rubin wrote: Now extrapolate that error rate from 30 lines to a program the size of Firefox (something like 5 MLOC), and you should see how fraught with danger that style of programming is. But you don't write 5 MLOC of code using that programming style. You use it to write a small core something along the lines of, oh, I don't know, a Python interpreter, and then write the rest of the code on top of that platform. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Lawrence D'Oliveiro wrote: But alone of all of these, garbage collection still remains just as costly to implement as ever. That should tell you something about how poorly it matches the characteristics of modern computing hardware. So maybe we need to redesign the hardware. Perhaps reference counts could be stored in their own special area of memory, updated in parallel with main memory accesses so that they don't slow anything down. Make it multiported so that all your cores can access it at once without locking. Maybe even build it out of counters instead of registers, so there's no data bus, only an address bus and inc/dec control lines. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
John Nagle na...@animats.com writes: I've argued for an approach in which only synchronized or immutable objects can be shared between threads. Then, only synchronized objects have refcounts. See http://www.animats.com/papers/languages/pythonconcurrency.html; I'm going to have to read this carefully, but my first reaction is that doing it right would take substantial changes to the language, to the point where it wouldn't really be Python any more. More generally, if you want to program in Erlang, why not use Erlang for that? I can't think of a time when I've really had to use a finalizer for something with dynamic extent. They've always seemed like a code smell to me. The problem appears when you have an object that owns something, like a window or a database connection. With is single-level. But expecting such an object to be gc'd seems like a code smell in its own right. I once implemented something like that in a Lisp system, and it was just disconcerting as hell to see windows on the screen blink out of existence unpredictably. The issue is maybe your function returns and you expect the window to vanish, but something somewhere has saved another temporary reference to it so it doesn't go away til later. If you want something with external semantics (like a window or file handle) to be released at a particular time, the program should do that explicitly. Don't use a finalizer that you expect storage reclamation to invoke. There is just too little control in a Python program over the creation of references. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
John Nagle na...@animats.com writes: Unoptimized reference counting, which is what CPython does, isn't all that great either. The four big bottlenecks in Python are boxed numbers, attribute lookups, reference count updates, and the GIL. The performance hit of having to lock the refcounts before update has been the historical reason for keeping the GIL. The LOCK prefix takes something like 100 cycles on an x86. Is optimizing the refcount updates going to anywhere near make up for that? Python's with statement as an approach to RAII has seemed ok to me. I can't think of a time when I've really had to use a finalizer for something with dynamic extent. They've always seemed like a code smell to me. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On 9/4/2010 11:51 PM, Paul Rubin wrote: John Naglena...@animats.com writes: Unoptimized reference counting, which is what CPython does, isn't all that great either. The four big bottlenecks in Python are boxed numbers, attribute lookups, reference count updates, and the GIL. The performance hit of having to lock the refcounts before update has been the historical reason for keeping the GIL. The LOCK prefix takes something like 100 cycles on an x86. Is optimizing the refcount updates going to anywhere near make up for that? I've argued for an approach in which only synchronized or immutable objects can be shared between threads. Then, only synchronized objects have refcounts. See http://www.animats.com/papers/languages/pythonconcurrency.html; Guido doesn't like it. He doesn't like any restrictions. So we're stuck dragging around the boat anchor. I'd hoped that the Unladen Swallow people might come up with some really clever solution, but they seem to be stuck. It's been almost a year since the last quarterly release. Maybe Google is putting their effort into Go. What's so striking is that Shed Skin can deliver 20x to 60x speedups over CPython, while PyPy and Unladen Swallow have trouble getting 1.5x. The question is how much one has to restrict the language to get a serious performance improvement. Python's with statement as an approach to RAII has seemed ok to me. I can't think of a time when I've really had to use a finalizer for something with dynamic extent. They've always seemed like a code smell to me. The problem appears when you have an object that owns something, like a window or a database connection. With is single-level. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Dennis Lee Bieber wlfr...@ix.netcom.com writes: Not to mention having to ensure that one finds ALL the references to the object so that they can be updated to the new address! Somehow I don't see a C compiler being smart enough to find intermediary pointer We're not talking about C compilers, which can cast any value at all into pointers. Languages designed for garbage collection are normally type-safe and gc is a well-understood problem, though (like compilers) the higher-performing ones are complicated. But, nothing in principle stops anyone from implementing Python with such methods. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
[gc] In article 7x7hj2kyd6@ruckus.brouhaha.com, Paul Rubin no.em...@nospam.invalid wrote: A minimal naive implementation indeed doubles the memory requirements, but from a Python perspective where every integer takes something like 24 bytes already, even that doesn't seem so terrible. Many people still use 32-bit Python -- an int is twelve bytes there. -- Aahz (a...@pythoncraft.com) * http://www.pythoncraft.com/ ...if I were on life-support, I'd rather have it run by a Gameboy than a Windows box. --Cliff Wells -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: That reinforces my point, about how easy it was to check the correctness of the code. In this case one simple fix, like this ... would render the code watertight. See how easy it is? Well, no, it's irrelevant how easy it is to fix the issue after it's pointed out. What matters is how easy it was to create it in the first place. You posted a 30-line code sample as obviously free of memory leaks, but even a good programmer like you didn't notice that it had the potential for a nasty memory overwrite error after an unbalanced decref. Keep in mind that a memory leak usually just means the program can eventually bog down and stops working, but an overwrite error is often a security hole that can lead to total compromise of your entire computer. Now extrapolate that error rate from 30 lines to a program the size of Firefox (something like 5 MLOC), and you should see how fraught with danger that style of programming is. Even the most skilled and careful programmers are going to slip up once in a while. Part of the problem is C itself. No language can eliminate many complicated bugs without creating severe usability problems, but good languages (unlike C) can eliminate most silly bugs. I had a dark night of the soul after reading some of the bug-finding papers at http://www.stanford.edu/~engler/ and have been terrified of C ever since. I'm just always skeptical when anyone says they're sure any piece of C code is obviously bug-free. It's quite easy to get overconfident about it (as I've done myself more than once). I spent about 5 minutes reviewing your patched code (and the underlying implementations of the C API functions it calls) and didn't see any other issues, and the code is probably ok now, but I'd have to spend a lot more time tracing through the API layer before I could be really sure. Anyway, you should check your patch into github if you haven't. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On 8/28/2010 5:42 AM, Aahz wrote: In article4c78572c$0$28655$c3e8...@news.astraweb.com, Steven D'Apranost...@remove-this-cybersource.com.au wrote: On Fri, 27 Aug 2010 09:16:52 -0700, Aahz wrote: In articlemailman.1967.1281549328.1673.python-l...@python.org, MRAB pyt...@mrabarnett.plus.com wrote: An object will be available for garbage collection when nothing refers to it either directly or indirectly. If it's unreferenced then it will go away. This isn't actually garbage collection as most people think of it. Refcounting semantics mean that objects get reaped as soon as nothing points at them. OTOH, CPython does also have garbage collection to back up refcounting so that when you have unreferenced object cycles they don't stay around. I've repeatedly asked, both here and elsewhere, why reference counting isn't real garbage collection. Nobody has been able to give me a satisfactory answer. As far as I can tell, it's a bit of pretentiousness with no basis in objective fact. You'll notice that I was very careful to qualify my statement with as most people think of it. Also, because CPython has two different memory management mechanisms, refcounting and cycle detection, and the module that controls cycle detection is called gc, I think it's simpler to follow along with the Python docs -- and critically important to remind people that there are in fact two different systems. Personally, I'd like to have reference counting only, an enforced prohibition on loops (backpointers must be weak pointers), RAII, and reliably ordered finalization. A big advantage of reference counting is that finalization happens in the thread that releases the object, and in the correct order. GC and finalization/destructors do not play well together at all. Microsoft once tried to get the hard cases to work right. See managed C++. Not a happy story. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In message mailman.434.1283568372.29448.python-l...@python.org, MRAB wrote: Lawrence D'Oliveirol...@geek-central.gen.new_zealand writes: Wonder why Sun’s licence explicitly forbade its use in danger-critical areas like nuclear power plants and the like, then? I thought it was just that if it wasn't explicitly forbidden then someone might try to use it and then sue if something went wrong, even though common sense would have said that it was a bad idea in the first place! :-) But you’ll notice that Free Software comes with no such restrictions. In fact, it is contrary to commonly-accepted Free Software guidelines to impose any sort of restrictions on areas of use. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In message 4c82b097$0$1661$742ec...@news.sonic.net, John Nagle wrote: Personally, I'd like to have reference counting only, an enforced prohibition on loops (backpointers must be weak pointers), RAII, and reliably ordered finalization. Is there a cheap way of checking at runtime for circular structures? A big advantage of reference counting is that finalization happens in the thread that releases the object, and in the correct order. GC and finalization/destructors do not play well together at all. Microsoft once tried to get the hard cases to work right. See managed C++. Not a happy story. Thank you for that. Another arrow for my anti-GC quiver. :) -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In message 7x7hj2kyd6@ruckus.brouhaha.com, Paul Rubin wrote: Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: In message 7xmxs2uez1@ruckus.brouhaha.com, Paul Rubin wrote: GC's for large systems generally don't free (or even examine) individual garbage objects. They copy the live objects to a new contiguous heap without ever touching the garbage, and then they release the old heap. And suddenly you’ve doubled the memory requirements. And on top of that, since you’re moving the valid objects into different memory, you’re forcing cache misses on all of them as well. A minimal naive implementation indeed doubles the memory requirements, but from a Python perspective where every integer takes something like 24 bytes already, even that doesn't seem so terrible. Doubling 24 is less terrible than doubling 4 or 8?? You’re kidding, right? More sophisticated implementations use multiple small heaps or other tricks. More and more complications to patch up the idea. At some point, you have to admit there is something fundamentally flawed about the whole concept. The new heap is filled sequentially so accesses to it will have good locality. Unfortunately, that‘s not how locality of reference works. It doesn’t matter whether the objects being accessed are close together in memory or far apart (not with modern fully-associative caches, anyway), what does matter is the frequency distribution of references, namely that the vast majority of references are to a tiny minority of objects. Your generational garbage collector completely breaks this assumption, by regularly forcing an access to every single object in the heap. Cache- thrashing, anyone? It's also the case that programs with very large memory consumption tend to use most of the memory for large arrays that don't contain pointers (think of a database server with a huge cache). That means the gc doesn't really have to think about all that much of the memory. But your generational garbage collector still has to copy those very large objects to the new heap, with all the cache-hostile consequences therefrom. By the way, isn’t this the opposite of the array-of-pointers example you were using earlier to try to cast reference-counting in a bad light? It seems to me a reference count would work very well for such a large, simple object. This is the continuing problem with garbage collection: all the attempts to make it cheaper just end up moving the costs somewhere else. Same thing with manual allocation. That moves the costs off the computer and onto the programmer. Not good, most of the time. Unfortunately, your argument falls down. It is a truism that hardware costs continue to come down, while programmers remain expensive. As I said before, computing performance has improved by something like five orders of magnitude over the last half-century. This has rendered all kinds of techniques, like high-level languages, dynamic memory allocation, stacks, hardware floating-point, memory protection and so on, which were once considered controversial because of their expense, cheap enough to become commonplace. But not garbage collection. This is because of the asymmetric way in which hardware has become faster: the biggest improvements have been in the parts that were already the fastest to begin with (the CPU), while RAM speeds have improved much less, and backing-store speeds least of all. Hence the need for intermediate layers of cache to bridge the gap. But the effectiveness of that caching depends crucially on certain assumptions about the runtime behaviour of the programs: and garbage collection breaks those assumptions. Really, I'm no gc expert, but the stuff you're saying about gc is quite ill-informed. You might want to check out some current literature. You may want to enlighten yourself by meditating on this seeming paradox of modern computing hardware: memory is cheap, but accessing memory is expensive. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: A minimal naive implementation indeed doubles the memory requirements, but from a Python perspective where every integer takes something like 24 bytes already, even that doesn't seem so terrible. Doubling 24 is less terrible than doubling 4 or 8?? You’re kidding, right? No, it would be doubling 4 or 8 bytes. The extra overhead like the reference count would not be there to bloat up the integer like in Python. More sophisticated implementations use multiple small heaps or other tricks. More and more complications to patch up the idea. At some point, you have to admit there is something fundamentally flawed about the whole concept. Oh sheesh, that's silly. Every area of programming where performance matters is full of optimization tricks. Look at any serious database implementation for example. Or any compiler. Look at Python's implementation of dictionaries. Yeah, the optimizations add complexity to improve performance, sometimes even in heuristic ways that can fail. That doesn't mean the concepts are fundamentally flawed. GC is no different. The new heap is filled sequentially so accesses to it will have good locality. what does matter is the frequency distribution of references, Sorry, just I meant during the gc operation itself. The gc's access pattern in the new heap is completely sequential as the gc just copies stuff to it linearly from from the old heap, bumping a pointer upwards. The access pattern when the user application is running is of course not predictable. Your generational garbage collector completely breaks this assumption, by regularly forcing an access to every single object in the heap. Cache- thrashing, anyone? In the minor collections, the whole minor heap fits in cache, so there's no thrashing. The major collections can use a different strategy, or you can simply rely on their relative infrequency. Why do you speculate like this? If you run a GHC program with profiling active, it tells you exactly how much time is spent in minor gc and how much time is in major gc, and it's all generally pretty tolerable unless your program has bugs. (Unfortunately Haskell programs are notoriously susceptable to a certain type of bug that causes them to still give the right answers, but use much more memory than they should. The usual sign of that happening is high gc load). It's also the case that programs with very large memory consumption tend to use most of the memory for large arrays that don't contain pointers But your generational garbage collector still has to copy those very large objects to the new heap, with all the cache-hostile consequences therefrom. Not necessarily, depends on how you write the program and how the gc works. By the way, isn’t this the opposite of the array-of-pointers example you were using earlier to try to cast reference-counting in a bad light? I wasn't trying to cast reference counting in a bad light, I was pointing out that reference counting can experience pauses just like traditional gc approaches. Most programs including soft real time programs can tolerate an occasional pause. If your program is not one of those, and you need guaranteed upper bounds on pauses so you can't use traditional gc, switching from gc to reference counting won't save you. It seems to me a reference count would work very well for such a large, simple object. Mark/sweep would do it too. Some gc's use a hybrid approach, with mark/sweep for older or larger objects. But not garbage collection. This is because of the asymmetric way in which hardware has become faster:... the effectiveness of that caching depends crucially on certain assumptions about the runtime behaviour of the programs: and garbage collection breaks those assumptions. ... You may want to enlighten yourself by meditating on this seeming paradox of modern computing hardware: memory is cheap, but accessing memory is expensive. I'm interested in receiving enlightment if you've got some pointers into the research literature that back up your views. Right now it sounds like you're going by some intuitions you have that aren't backed up by evidence. Anyone who has performance-tuned a program knows that intuition can give reasonable starting points for experiments and measurements, but once the results start coming in, a lot of the intuitions end up invalidated. By now, there is enough experimental and theoretical literature about gc that opinions like yours, that don't seem to be grounded in any knowledge of that literature, are not very persuasive no matter how superficially attractive the raw intuition might be. Especially in your case, where you seem to have decided ahead of time what conclusion you want to reach and are looking for ways to justify it. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On 9/4/2010 6:44 PM, Lawrence D'Oliveiro wrote: In message4c82b097$0$1661$742ec...@news.sonic.net, John Nagle wrote: Personally, I'd like to have reference counting only, an enforced prohibition on loops (backpointers must be weak pointers), RAII, and reliably ordered finalization. Is there a cheap way of checking at runtime for circular structures? It's an interesting technical problem to design a system where circular references are detected immediately, at the moment of creation. However, Python already detects loops during garbage collection. If you set gc.set_debug(gc.DEBUG_SAVEALL) all the loops show up in gc.garbage. A big advantage of reference counting is that finalization happens in the thread that releases the object, and in the correct order. GC and finalization/destructors do not play well together at all. Microsoft once tried to get the hard cases to work right. See managed C++. Not a happy story. Thank you for that. Another arrow for my anti-GC quiver. :) Unoptimized reference counting, which is what CPython does, isn't all that great either. The four big bottlenecks in Python are boxed numbers, attribute lookups, reference count updates, and the GIL. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In message 7xvd6sv0n4@ruckus.brouhaha.com, Paul Rubin wrote: Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: AddrObj = PyTuple_GetItem(TheBufferInfo, 0); LenObj = PyTuple_GetItem(TheBufferInfo, 1); the first PyTuple_GetItem succeeds and the second one fails. Admittedly, I did take a shortcut here: array.buffer_info returns a tuple of two items, so I’m not expecting one GetItem to succeed and the other to fail. FromArray is a parameter to the function, with no type check to make sure it's really an array. In fact your code allows for the possibility that it doesn't support the buffer_info operation (if I understand the purpose of the null return check after the PyObject_CallMethod) which means it's prepared for the argument to -not- be an array. That reinforces my point, about how easy it was to check the correctness of the code. In this case one simple fix, like this diff --git a/spuhelper.c b/spuhelper.c index 83fd4eb..2ba8197 100644 --- a/spuhelper.c +++ b/spuhelper.c @@ -151,10 +151,12 @@ static void GetBufferInfo if (TheBufferInfo == 0) break; AddrObj = PyTuple_GetItem(TheBufferInfo, 0); - LenObj = PyTuple_GetItem(TheBufferInfo, 1); if (PyErr_Occurred()) break; Py_INCREF(AddrObj); + LenObj = PyTuple_GetItem(TheBufferInfo, 1); + if (PyErr_Occurred()) + break; Py_INCREF(LenObj); *addr = PyInt_AsUnsignedLongMask(AddrObj); *len = PyInt_AsUnsignedLongMask(LenObj); would render the code watertight. See how easy it is? -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In message 7xiq2que93@ruckus.brouhaha.com, Paul Rubin wrote: Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: Refcounting is susceptable to the same pauses for reasons already discussed. Doesn’t seem to happen in the real world, though. def f(n): from time import time a = [1] * n t0 = time() del a t1 = time() return t1 - t0 for i in range(9): print i, f(10**i) on my system prints: 0 2.86102294922e-06 1 2.14576721191e-06 2 3.09944152832e-06 3 1.00135803223e-05 4 0.000104904174805 5 0.00098991394043 6 0.00413608551025 7 0.037693977356 8 0.362598896027 Looks pretty linear as n gets large. 0.36 seconds (the last line) is a noticable pause. Which just proves the point. You had to deliberately set up the situation to make that happen. And it remains just as easy to pinpoint where it is happening, so you can control it. With a garbage collector, you don’t have that control. Even if you try to avoid freeing a single large structure at once, it’s still liable to batch up a lot of small objects to free at once, so the problem can still happen. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In message 7xmxs2uez1@ruckus.brouhaha.com, Paul Rubin wrote: Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: Whereas garbage collection will happen at some indeterminate time long after the last access to the object, when it very likely will no longer be in the cache, and have to be brought back in just to be freed, GC's for large systems generally don't free (or even examine) individual garbage objects. They copy the live objects to a new contiguous heap without ever touching the garbage, and then they release the old heap. And suddenly you’ve doubled the memory requirements. And on top of that, since you’re moving the valid objects into different memory, you’re forcing cache misses on all of them as well. This is the continuing problem with garbage collection: all the attempts to make it cheaper just end up moving the costs somewhere else. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In message 7xr5heufhb@ruckus.brouhaha.com, Paul Rubin wrote: Java has considerably greater reputation for reliability than C or C++. Wonder why Sun’s licence explicitly forbade its use in danger-critical areas like nuclear power plants and the like, then? Ada is a different story, but Ada programs (because of the application area Ada is used in) tend not to use a lot of dynamic memory allocation in the first place. A little googling shows there are GC extensions available for Ada, though I don't know if they are used much. Let’s put it this way: the life-support system on the International Space Station is written in Ada. Would you trust your life to code written in Java? -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: Java has considerably greater reputation for reliability than C or C++. Wonder why Sun’s licence explicitly forbade its use in danger-critical areas like nuclear power plants and the like, then? Probably because Sun lawyers demanded it. Is there a Sun C or C++ compiler with a license that doesn't have that restriction? Even if there is, it just means those languages are so unreliable that the lawyers felt confident that any meltdown could be blamed on a bug in the user's rather than the compiler ;-). Let’s put it this way: the life-support system on the International Space Station is written in Ada. Would you trust your life to code written in Java? The scary thing is I don't know whether I'm already doing that. Life support systems have hard real-time requirements (Ada's forte) but I'd expect lots of military decision-support systems are written in Java. Maybe one of them will raise a false alert and somebody will launch a war. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On 04/09/2010 03:21, Paul Rubin wrote: Lawrence D'Oliveirol...@geek-central.gen.new_zealand writes: Java has considerably greater reputation for reliability than C or C++. Wonder why Sun’s licence explicitly forbade its use in danger-critical areas like nuclear power plants and the like, then? Probably because Sun lawyers demanded it. Is there a Sun C or C++ compiler with a license that doesn't have that restriction? Even if there is, it just means those languages are so unreliable that the lawyers felt confident that any meltdown could be blamed on a bug in the user's rather than the compiler ;-). Let’s put it this way: the life-support system on the International Space Station is written in Ada. Would you trust your life to code written in Java? The scary thing is I don't know whether I'm already doing that. Life support systems have hard real-time requirements (Ada's forte) but I'd expect lots of military decision-support systems are written in Java. Maybe one of them will raise a false alert and somebody will launch a war. I thought it was just that if it wasn't explicitly forbidden then someone might try to use it and then sue if something went wrong, even though common sense would have said that it was a bad idea in the first place! :-) -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: GC's for large systems generally don't free (or even examine) individual garbage objects. They copy the live objects to a new contiguous heap without ever touching the garbage, and then they release the old heap. And suddenly you’ve doubled the memory requirements. And on top of that, since you’re moving the valid objects into different memory, you’re forcing cache misses on all of them as well. A minimal naive implementation indeed doubles the memory requirements, but from a Python perspective where every integer takes something like 24 bytes already, even that doesn't seem so terrible. More sophisticated implementations use multiple small heaps or other tricks. It still costs something in memory footprint, but less than the minimal implementation's 2x cost. The new heap is filled sequentially so accesses to it will have good locality. You do have to jump around within the old heap, but again, with generational schemes, in the more frequent collections, the old heap fits entirely in cache. For example, GHC's minor heap size is 256kB. For major collections, GHC switches (or used to) from copying to a mark/compact scheme once the amount of live data in the heap goes over some amount, giving the best of both worlds. It's also the case that programs with very large memory consumption tend to use most of the memory for large arrays that don't contain pointers (think of a database server with a huge cache). That means the gc doesn't really have to think about all that much of the memory. This is the continuing problem with garbage collection: all the attempts to make it cheaper just end up moving the costs somewhere else. Same thing with manual allocation. That moves the costs off the computer and onto the programmer. Not good, most of the time. Really, I'm no gc expert, but the stuff you're saying about gc is quite ill-informed. You might want to check out some current literature. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On 8/30/2010 12:22 AM, Paul Rubin wrote: I guess that is how the so-called smart pointers in the Boost C++ template library work. I haven't used them so I don't have personal experience with how convenient or reliable they are, or what kinds of constraints they imposed on programming style. I've always felt a bit suspicious of them though, and I seem to remember Alex Martelli (I hope he shows up here again someday) advising against using them. Smart pointers in C++ have never quite worked right. They almost work. But there always seems to be something that needs access to a raw C pointer, which breaks the abstraction. The mold keeps creeping through the wallpaper. Also, since they are a bolt-on at the macro level in C++, reference count updates aren't optimized and hoisted out of loops. (They aren't in CPython either, but there have been reference counted systems that optimize out most reference count updates.) John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Dennis Lee Bieber wlfr...@ix.netcom.com writes: GC's for large systems ... copy the live objects to a new contiguous heap That sounds suspiciously like the original Macintosh OS, with its handles... IE, double-indirection. Nah, a double indirection on every access would be a terrible performance hit. The classic approach is when you move an object to the new heap, you leave a tagged forwarding pointer at its former location the old heap, giving the its location in the new heap. Then as you move other objects, you dereference the pointers in them to see whether they point to moved or unmoved objects, and relocate any unmoved ones. A more complete explanation is here: http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-33.html#%_sec_5.3.2 -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: And yet Java code, for example, doesn’t have a reputation for greater reliability compared to, say code written in Ada or C++, or even C. What is the Java runtime written in? C. Why not use Java, if there is no speed penalty in doing so? The Java runtime (such as the garbage collector) needs untyped pointers and can't be written in Java. It might be possible to write a type-safe GC in something like ATS, which has extremely precise types, but that is almost alien technology by comparison to writing in C. Java has considerably greater reputation for reliability than C or C++. Ada is a different story, but Ada programs (because of the application area Ada is used in) tend not to use a lot of dynamic memory allocation in the first place. A little googling shows there are GC extensions available for Ada, though I don't know if they are used much. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: Whereas garbage collection will happen at some indeterminate time long after the last access to the object, when it very likely will no longer be in the cache, and have to be brought back in just to be freed, GC's for large systems generally don't free (or even examine) individual garbage objects. They copy the live objects to a new contiguous heap without ever touching the garbage, and then they release the old heap. That has the effect of improving locality, since the new heap is compacted and has no dead objects. The algorithms are generational (they do frequent gc's on recently-created objects and less frequent ones on older objects), so minor gc operations are on regions that fit in cache, while major ones might have cache misses but are infrequent. Non-compacting reference counting (or simple mark/sweep gc) has much worse fragmentation and consequently worse cache locality than copying-style gc. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: Refcounting is susceptable to the same pauses for reasons already discussed. Doesn’t seem to happen in the real world, though. def f(n): from time import time a = [1] * n t0 = time() del a t1 = time() return t1 - t0 for i in range(9): print i, f(10**i) on my system prints: 0 2.86102294922e-06 1 2.14576721191e-06 2 3.09944152832e-06 3 1.00135803223e-05 4 0.000104904174805 5 0.00098991394043 6 0.00413608551025 7 0.037693977356 8 0.362598896027 Looks pretty linear as n gets large. 0.36 seconds (the last line) is a noticable pause. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In message 7xtymbzixt@ruckus.brouhaha.com, Paul Rubin wrote: It's pretty well established by now that GC doesn't have any significant speed penalty compared with manual allocation. It does consume more memory, which is acceptable a lot of the time. It certainly leads to more reliable programs. And yet Java code, for example, doesn’t have a reputation for greater reliability compared to, say code written in Ada or C++, or even C. What is the Java runtime written in? C. Why not use Java, if there is no speed penalty in doing so? -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In message 7x39tz42fd@ruckus.brouhaha.com, Paul Rubin wrote: Dennis Lee Bieber wlfr...@ix.netcom.com writes: Heap marking, OTOH, tends to run at indeterminate times, which could have an impact if one needs predictable response timings Reference counting has the same problem. If you drop the last reference to a complex structure, it could take quite a long time to free all the components. One difference is the interaction with caching behaviour. When a reference- counted object is freed, the odds are that happens fairly soon after the last access, so the object will still be in the CPU cache, and access will be fast. Whereas garbage collection will happen at some indeterminate time long after the last access to the object, when it very likely will no longer be in the cache, and have to be brought back in just to be freed, quite likely bumping out something else that the running program needs to access. This is one reason why garbage collection is still considered an expensive technique. Computing power has improved by something like five orders of magnitude over the last half-century, making possible all kinds of productivity-enhancing techniques that were once considered expensive to become commonplace: high-level languages, dynamic memory allocation, stacks, hardware floating-point, memory protection and so on. But alone of all of these, garbage collection still remains just as costly to implement as ever. That should tell you something about how poorly it matches the characteristics of modern computing hardware. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In message 7xmxs4uzjl@ruckus.brouhaha.com, Paul Rubin wrote: Gregory Ewing greg.ew...@canterbury.ac.nz writes: I'd be disappointed if CPython ditched refcounting and then became unsuitable for real-time games as a result. Refcounting is susceptable to the same pauses for reasons already discussed. Doesn’t seem to happen in the real world, though. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: static void GetBufferInfo ( ... do /*once*/ { TheBufferInfo = PyObject_CallMethod(FromArray, buffer_info, ); if (TheBufferInfo == 0) break; AddrObj = PyTuple_GetItem(TheBufferInfo, 0); LenObj = PyTuple_GetItem(TheBufferInfo, 1); if (PyErr_Occurred()) break; ... Py_INCREF(AddrObj); Py_INCREF(LenObj); } while (false); Py_XDECREF(AddrObj); Py_XDECREF(LenObj); Py_XDECREF(TheBufferInfo); } /*GetBufferInfo*/ It’s quite easy to assure yourself that this is never going to leak memory. Actually that code looks suspicious. Suppose in AddrObj = PyTuple_GetItem(TheBufferInfo, 0); LenObj = PyTuple_GetItem(TheBufferInfo, 1); the first PyTuple_GetItem succeeds and the second one fails. Then AddrObj is a borrowed reference to the first tuple element and LenObj is null, the error flag is set, so you break out of the do/while. You then decrement the refcount of AddrObj even though you didn't increment it. Maybe there's an explanation that makes it ok somehow, but it still looks messy. This is the kind of problem I'm referring to in general. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Steven D'Aprano steve-remove-t...@cybersource.com.au writes: I'm not saying that ref counting systems can avoid incrementing and decrementing the ref counts. That would be silly. But I'm saying that it is an accident of implementation that writing C extensions requires you to manually manage the counts yourself. That's a side-effect of permitting coders to write C extensions in pure C, rather than in some intermediate language which handles the ref counts for you but otherwise compiles like C. If people cared enough, they could (probably) make the C compiler handle it for them, just as it currently handles incrementing and decrementing the return stack. I guess that is how the so-called smart pointers in the Boost C++ template library work. I haven't used them so I don't have personal experience with how convenient or reliable they are, or what kinds of constraints they imposed on programming style. I've always felt a bit suspicious of them though, and I seem to remember Alex Martelli (I hope he shows up here again someday) advising against using them. I don't think a C compiler could really manage automatic decrementing while still being C. Think especially of the common style of exception handling in C using longjmp. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On Mon, 30 Aug 2010 00:22:17 -0700, Paul Rubin wrote: I don't think a C compiler could really manage automatic decrementing while still being C. Think especially of the common style of exception handling in C using longjmp. You might very well be right. But that's the problem with C -- it's too low a level language to expect the compiler to protect you much. Or at all. There will always be some use cases for managing memory yourself, or even managing the return stack (as you can do in Forth, for example), and so there will always need to be some sort of high-level assembler like C. But it astounds me that in 2010 people still routinely use C for normal, everyday application programming. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Paul Rubin wrote: These days I think the GC pause issue is overrated except for real-time control applications. Also for games, which are a fairly common application these days. Even a few milliseconds can be too long when you're trying to achieve smooth animation. I'd be disappointed if CPython ditched refcounting and then became unsuitable for real-time games as a result. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In message 7xr5hg3a7s@ruckus.brouhaha.com, Paul Rubin wrote: Actually that code looks suspicious. Suppose in AddrObj = PyTuple_GetItem(TheBufferInfo, 0); LenObj = PyTuple_GetItem(TheBufferInfo, 1); the first PyTuple_GetItem succeeds and the second one fails. Admittedly, I did take a shortcut here: array.buffer_info returns a tuple of two items, so I’m not expecting one GetItem to succeed and the other to fail. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In message 7x39twpuxi@ruckus.brouhaha.com, Paul Rubin wrote: Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: the CPython API means endlessly finding and fixing refcount bugs that lead to either crashes/security failures, or memory leaks. I don’t see why that should be so. It seems a very simple discipline to follow: initialize, allocate, free. Here’s an example snippet from my DVD Menu Animator http://github.com/ldo/dvd_menu_animator: In practice it has been a problem. Maybe people need to learn to write code in a more structured fashion. If it hasn't happened to you yet, you're either burning a bunch of effort that programmers of more automatic systems can put to more productive uses ... What makes you say that? Avoiding bugs is not a “productive use”? And how do you run such an application? You have to limit it to a predetermined amount of memory to begin with, otherwise it would easily gobble up everything you have. No that's usually not a problem-- the runtime system (generational gc) can figure out enough from your allocation pattern to prevent the heap from getting overlarge. And yet Java apps, for example, are (in)famous for excessive memory usage compared to those written in non-GC-dependent languages. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: AddrObj = PyTuple_GetItem(TheBufferInfo, 0); LenObj = PyTuple_GetItem(TheBufferInfo, 1); the first PyTuple_GetItem succeeds and the second one fails. Admittedly, I did take a shortcut here: array.buffer_info returns a tuple of two items, so I’m not expecting one GetItem to succeed and the other to fail. FromArray is a parameter to the function, with no type check to make sure it's really an array. In fact your code allows for the possibility that it doesn't support the buffer_info operation (if I understand the purpose of the null return check after the PyObject_CallMethod) which means it's prepared for the argument to -not- be an array. In that case maybe it's some other object with a buffer_info operation that returns a 1-element tuple. If the function is callable from Python code, then that arg type is completely out of the C code's control. Even if it's only callable from C, you're still depending on not one but two different invariants (that the arg is an array, and that array.buffer_info returns a 2-tuple) that are undocumented and unchecked in the function. I cannot agree with your claim that the approach scales. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: If it hasn't happened to you yet, you're either burning a bunch of effort that programmers of more automatic systems can put to more productive uses ... What makes you say that? Avoiding bugs is not a “productive use”? Avoiding any particular bug through constant and pervasive vigilance is far less productive than using a system where causing that particular type of bug is impossible to begin with. IMO the code you posted has latent bugs as discussed in the other post. It might work at the moment you checked it in but it is brittle. I wouldn't have signed off on it in a code review. And yet Java apps, for example, are (in)famous for excessive memory usage compared to those written in non-GC-dependent languages. I think that may mostly be an issue with the bloated nature of most Java apps. Certainly Lisp systems have run in production for decades on machines with much less memory than we would consider acceptable these days for any substantial Python app. It's probably true that gc copying-style gc is more memory hungry than Python's refcount system. Mark-sweep gc should have comparable memory consumption and better speed than refcounting, though less speed than copying. JHC (experimental Haskell compiler) recently started using a mixture of gc and region inference. It will be interesting to see how that works out. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Gregory Ewing greg.ew...@canterbury.ac.nz writes: These days I think the GC pause issue is overrated except for real-time control applications. Also for games, which are a fairly common application these days. Even a few milliseconds can be too long when you're trying to achieve smooth animation. The usual hack with games is you do a major gc when the user advances between game levels. You can do minor gc's during the screen refresh interval. I'd be disappointed if CPython ditched refcounting and then became unsuitable for real-time games as a result. Refcounting is susceptable to the same pauses for reasons already discussed. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In article 4c7b279d$0$28650$c3e8...@news.astraweb.com, Steven D'Aprano steve-remove-t...@cybersource.com.au wrote: On Sun, 29 Aug 2010 17:52:38 -0700, Paul Rubin wrote: Attribution lost: That's a problem with the CPython API, not reference counting. The problem is that the CPython API is written at too low a level, beneath that at which the garbage collector exists, so naturally you have to manually manage memory. Care to give an example of a reference counted system that's written any other way? The complexity of the ref counter is invisible when writing pure Python code, and I believe it is also invisible when writing code in Cython. The difficulty of dealing with ref counts is abstracted away. That's not completely true. You know perfectly well that it's almost trivially easy to leak memory with refcounting, and there are certain ways in which Python leaks memory invisibly if you don't know how it works. One recent example at work was when someone was arguing with me about whether we were leaking file handles and I had to prove that you could leak file handles if you didn't clean up exceptions -- but that cleaning up the exception *did* close the file handles. (This code was originally written in Python 2.4, and partly because of this we are making more of a push to use with) If you're restricting your claim just to the actual management of the reference counter, you're correct, but it's especially not clear that your second sentence is so restricted. -- Aahz (a...@pythoncraft.com) * http://www.pythoncraft.com/ ...if I were on life-support, I'd rather have it run by a Gameboy than a Windows box. --Cliff Wells -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In message 7xr5hguzzi@ruckus.brouhaha.com, Paul Rubin wrote: JHC (experimental Haskell compiler) recently started using a mixture of gc and region inference. It will be interesting to see how that works out. That’s what people have been saying about garbage collection for about half a century: “this new experimental technique will solve all the problems, just you wait and see”. Meanwhile, real-world programmers get on to writing real-world code that is productive and efficient on real-world systems. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: Meanwhile, real-world programmers get on to writing real-world code that is productive and efficient on real-world systems. It's pretty well established by now that GC doesn't have any significant speed penalty compared with manual allocation. It does consume more memory, which is acceptable a lot of the time. It certainly leads to more reliable programs. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Steven D'Aprano wrote: On Sat, 28 Aug 2010 00:33:10 -0700, Paul Rubin wrote: If you drop the last reference to a complex structure, it could take quite a long time to free all the components. By contrast there are provably real-time tracing gc schemes, including some parallelizeable ones. I could be wrong, but how can they not be subject to the same performance issue? If you have twenty thousand components that all have to be freed, they all have to be freed whether you do it when the last reference is cleared, or six seconds later when the gc does a sweep. Parallelizable garbage collectors have performance issues, but they're not the same issues as marksweep collectors have. Parallelizable GCs break up their work in a zillion little pieces and allow the VM to do some real work after each piece. They won't free your twenty thousand components all in one go and you won't have that embarrassing pause. Parallelizable garbage collectors require some careful coordination between the GC and the VM. This takes CPU time, so on the whole they're slower than traditional garbage collectors. So instead of unpredictable embarrassing pauses, you have a VM that's consistently slow. For some applications consistency is more important than raw speed and for these applications parallelizeable GCs are an improvement. HTH, -- HansM -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Hans Mulder han...@xs4all.nl writes: Parallelizable garbage collectors have performance issues, but they're not the same issues as marksweep collectors have. Parallelizable GCs break up their work in a zillion little pieces and allow the VM to do some real work after each piece. They won't free your twenty thousand components all in one go and you won't have that embarrassing pause. Quibble: parallel GC just means any GC that runs in multiple threads simultaneously. A GC that guarantees no pauses (more technically, specifies a constant such that any GC pause is guaranteed to be shorter than the constant) is called a real time GC, and real-time GC's are usually single threaded. Parallel real-time GC's were once sort of a holy grail, though workable ones have been implemented. GHC for example currently uses a GC that is parallel (runs on multiple cores for speed) but is not real-time (there can be a pause), and I think the Sun JVM is the same way. These days I think the GC pause issue is overrated except for real-time control applications. GC's no longer frequently make the program freeze up for seconds at a time or anything like that. It was worse in the old days when CPU's were slower and memory was scarcer. Serious GC's are usually generational, with minor GC's taking microseconds and less frequent major GC's taking fractions of a second. So in an interactive program or animation on your desktop you might notice a little hiccup once in a while. For something like a web server an occasional slight variation in response time could easily be random network delays and so forth. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Steven D'Aprano st...@remove-this-cybersource.com.au writes: You can add cycle detection to a reference count gc, at the cost of more complexity. But then it's not purely a refcount gc. ;) If you read the Wikipedia article I linked to, tracing algorithms can also be unsound: [describes conservative gc] Yeah, whether that counts as a real GC is subjective too. and 2) it requires constant attention from the mutator to incr and decr the reference counts. Yes. And? I thought I made it clear, the purpose of gc is to improve programmer productivity and program reliability by relieving the programmer from the burden of memory allocation bookkeeping. If the programmer has to painstakingly manage refcounts by hand and the resulting programs are still prone to refcount bugs (both of which are the case with CPython), it's not really gc in a way that lives up to the name. That's a problem with the CPython API, not reference counting. The problem is that the CPython API is written at too low a level, beneath that at which the garbage collector exists, so naturally you have to manually manage memory. Care to give an example of a reference counted system that's written any other way? On the other hand, tracing gcs have their own set of problems too, mostly due to the use of finalizers and attempts to make garbage collection run more predictably. See here: I think it's fairly common wisdom that finalizers and gc don't interact very well. Finalizers in general aren't in the gc spirit, which says the system should give the illusion that every object stays around forever. As for stuff like hash tables, a usual way to help out the gc is by populating the hash table with weak references, rather than by clearing it out manually when you're done with it. It's also possible for fancy compilers to use a mixture of gc and stack- or region-based allocation. Tracing garbage collectors aren't a panacea. They're software themselves, and complex software, which means they're subject to bugs like the one ... You could say the same thing about compilers instead of gc's. The idea in both cases is yes, they're complex, but they put the complexity in one place, so that the application program can rely on the complicated gc or compiler features while itself staying simple. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Steven D'Aprano st...@remove-this-cybersource.com.au writes: I could be wrong, but how can they not be subject to the same performance issue? If you have twenty thousand components that all have to be freed, they all have to be freed whether you do it when the last reference is cleared, or six seconds later when the gc does a sweep. GC's on large machines these days do not sweep at all. They copy the live data to a new heap, then release the old heap. Because there is usually more garbage than live data, copying GC's that never touch the garbage are usually faster than any scheme involving freeing unused objects one by one. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On Sun, 29 Aug 2010 17:52:38 -0700, Paul Rubin wrote: That's a problem with the CPython API, not reference counting. The problem is that the CPython API is written at too low a level, beneath that at which the garbage collector exists, so naturally you have to manually manage memory. Care to give an example of a reference counted system that's written any other way? The complexity of the ref counter is invisible when writing pure Python code, and I believe it is also invisible when writing code in Cython. The difficulty of dealing with ref counts is abstracted away. The argument that it will work if we always do this means that it won't work is a nice quip, but it doesn't stand up. It's possible to defeat even Pascal compilers' type checking and do unsafe things by dropping down into assembly. So don't do it! It's not hard to not do something, although sometimes you might choose to do it anyway, because otherwise there is no way to get the code you need/want. But such code should be a rare exception. I'm not saying that ref counting systems can avoid incrementing and decrementing the ref counts. That would be silly. But I'm saying that it is an accident of implementation that writing C extensions requires you to manually manage the counts yourself. That's a side-effect of permitting coders to write C extensions in pure C, rather than in some intermediate language which handles the ref counts for you but otherwise compiles like C. If people cared enough, they could (probably) make the C compiler handle it for them, just as it currently handles incrementing and decrementing the return stack. There's nothing *fundamental* to the idea of ref counting that says that you must handle the counts manually -- it depends on how well insulated you are from the underlying machine. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In message 7x4oeftuk4@ruckus.brouhaha.com, Paul Rubin wrote: I'd say [reference-counting is] not real gc because 1) it's unsound (misses reference cycles), and 2) it requires constant attention from the mutator to incr and decr the reference counts. So developing modules for the CPython API means endlessly finding and fixing refcount bugs that lead to either crashes/security failures, or memory leaks. I don’t see why that should be so. It seems a very simple discipline to follow: initialize, allocate, free. Here’s an example snippet from my DVD Menu Animator http://github.com/ldo/dvd_menu_animator: static void GetBufferInfo ( PyObject * FromArray, unsigned long * addr, unsigned long * len ) /* returns the address and length of the data in a Python array object. */ { PyObject * TheBufferInfo = 0; PyObject * AddrObj = 0; PyObject * LenObj = 0; do /*once*/ { TheBufferInfo = PyObject_CallMethod(FromArray, buffer_info, ); if (TheBufferInfo == 0) break; AddrObj = PyTuple_GetItem(TheBufferInfo, 0); LenObj = PyTuple_GetItem(TheBufferInfo, 1); if (PyErr_Occurred()) break; Py_INCREF(AddrObj); Py_INCREF(LenObj); *addr = PyInt_AsUnsignedLongMask(AddrObj); *len = PyInt_AsUnsignedLongMask(LenObj); if (PyErr_Occurred()) break; } while (false); Py_XDECREF(AddrObj); Py_XDECREF(LenObj); Py_XDECREF(TheBufferInfo); } /*GetBufferInfo*/ It’s quite easy to assure yourself that this is never going to leak memory. More complicated examples can simply nest constructs like these one within the other to arbitrary depth, while still giving the same assurance at every level. In short, this technique scales well. If you program the Java JNI or a typical Lisp FFI, you'll find that real gc is a lot simpler to use since you avoid all the refcount maintenance hassles. You allocate memory and shut your eyes, and the gc takes care of freeing it when it figures out that you are done. And how do you run such an application? You have to limit it to a predetermined amount of memory to begin with, otherwise it would easily gobble up everything you have. In the old days of “classic” MacOS, every application had to run in a fixed- size application heap. I have no wish to return to those days. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes: the CPython API means endlessly finding and fixing refcount bugs that lead to either crashes/security failures, or memory leaks. I don’t see why that should be so. It seems a very simple discipline to follow: initialize, allocate, free. Here’s an example snippet from my DVD Menu Animator http://github.com/ldo/dvd_menu_animator: In practice it has been a problem. If it hasn't happened to you yet, you're either burning a bunch of effort that programmers of more automatic systems can put to more productive uses, or else you just haven't written enough such code to have gotten bitten yet. You allocate memory and shut your eyes, and the gc takes care of freeing it when it figures out that you are done. And how do you run such an application? You have to limit it to a predetermined amount of memory to begin with, otherwise it would easily gobble up everything you have. No that's usually not a problem-- the runtime system (generational gc) can figure out enough from your allocation pattern to prevent the heap from getting overlarge. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Dennis Lee Bieber wlfr...@ix.netcom.com writes: The nice thing about it [reference counting] is that it is sort of deterministic -- one can examine code and determine that an object is collected at some point in the execution... Heap marking, OTOH, tends to run at indeterminate times, which could have an impact if one needs predictable response timings Reference counting has the same problem. If you drop the last reference to a complex structure, it could take quite a long time to free all the components. By contrast there are provably real-time tracing gc schemes, including some parallelizeable ones. One reason CPython still can't run threads on parallel cores is it would have to lock the reference counts every time they're updated, and the slowdown from that is terrible. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On Sat, 28 Aug 2010 00:33:10 -0700, Paul Rubin wrote: Dennis Lee Bieber wlfr...@ix.netcom.com writes: The nice thing about it [reference counting] is that it is sort of deterministic -- one can examine code and determine that an object is collected at some point in the execution... Heap marking, OTOH, tends to run at indeterminate times, which could have an impact if one needs predictable response timings Reference counting has the same problem. In theory, yes, but in practice ref counting tends to spread out the performance impact more smoothly. There are exceptions, such as the one you mention below, but as a general rule ref counting isn't subject to the embarrassing pauses that tracing garbage collectors tend to be subject to. If you drop the last reference to a complex structure, it could take quite a long time to free all the components. By contrast there are provably real-time tracing gc schemes, including some parallelizeable ones. I could be wrong, but how can they not be subject to the same performance issue? If you have twenty thousand components that all have to be freed, they all have to be freed whether you do it when the last reference is cleared, or six seconds later when the gc does a sweep. One reason CPython still can't run threads on parallel cores is it would have to lock the reference counts every time they're updated, and the slowdown from that is terrible. On the other hand, the reason that CPython still has reference counting is that the alternatives tried so far are unacceptably for non-threaded code. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On Fri, 27 Aug 2010 18:06:19 -0700, Paul Rubin wrote: Steven D'Aprano st...@remove-this-cybersource.com.au writes: I've repeatedly asked, both here and elsewhere, why reference counting isn't real garbage collection. Nobody has been able to give me a satisfactory answer. As far as I can tell, it's a bit of pretentiousness with no basis in objective fact. Well, it's a bit of a subjective matter. I'd say it's not real gc because 1) it's unsound (misses reference cycles), You can add cycle detection to a reference count gc, at the cost of more complexity. If you read the Wikipedia article I linked to, tracing algorithms can also be unsound: Some collectors running in a particular environment can correctly identify all pointers (references) in an object; these are called precise (also exact or accurate) collectors, the opposite being a conservative or partly conservative collector. Conservative collectors have to assume that any bit pattern in memory could be a pointer if (when interpreted as a pointer) it would point into any allocated object. Thus, conservative collectors may have some false negatives, where storage is not released because of accidental fake pointers... http://en.wikipedia.org/wiki/Garbage_collection_(computer_science) and 2) it requires constant attention from the mutator to incr and decr the reference counts. Yes. And? So developing modules for the CPython API means endlessly finding and fixing refcount bugs that lead to either crashes/security failures, or memory leaks. If you program the Java JNI or a typical Lisp FFI, you'll find that real gc is a lot simpler to use since you avoid all the refcount maintenance hassles. You allocate memory and shut your eyes, and the gc takes care of freeing it when it figures out that you are done. Refcounting is basically a form of manual memory management, while gc is automatic. That's a problem with the CPython API, not reference counting. The problem is that the CPython API is written at too low a level, beneath that at which the garbage collector exists, so naturally you have to manually manage memory. Someone said here recently that as a program gets larger, saying this will work as long as we do X every time without fail becomes equal to saying this won't work. Substitute properly maintain all ref counts for X and you can see the problem. I've seen released production tested Python C modules with subtle refcount bugs on more than one occasion. In gc'd systems there are fewer places for the code to go wrong. On the other hand, tracing gcs have their own set of problems too, mostly due to the use of finalizers and attempts to make garbage collection run more predictably. See here: http://publib.boulder.ibm.com/infocenter/javasdk/v1r4m2/topic/com.ibm.java.doc.diagnostics.142j9/html/coexistwithgc.html Quote: For tidying Java resources, think about the use of a clean up routine. When you have finished with an object, call the routine to null out all references, deregister listeners, clear out hash tables, and so on. This is far more efficient than using a finalizer and has the useful side-benefit of speeding up garbage collection. The Garbage Collector does not have so many object references to chase in the next garbage collection cycle. Translated: Rather than relying on the garbage collector to clean up resources after you, do it yourself, manually, so the garbage collector has less work to do. Tracing garbage collectors aren't a panacea. They're software themselves, and complex software, which means they're subject to bugs like the one which plagued Flash plugin 9: http://gskinner.com/blog/archives/2008/04/failure_to_unlo.html The more complicated the garbage collector, the more scope you have for some interaction between your high-level code and the gc leading to memory not be reclaimed or extreme slowdown. Like this: http://tech.puredanger.com/2009/02/11/linkedblockingqueue-garbagecollection/ -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In article 4c78572c$0$28655$c3e8...@news.astraweb.com, Steven D'Aprano st...@remove-this-cybersource.com.au wrote: On Fri, 27 Aug 2010 09:16:52 -0700, Aahz wrote: In article mailman.1967.1281549328.1673.python-l...@python.org, MRAB pyt...@mrabarnett.plus.com wrote: An object will be available for garbage collection when nothing refers to it either directly or indirectly. If it's unreferenced then it will go away. This isn't actually garbage collection as most people think of it. Refcounting semantics mean that objects get reaped as soon as nothing points at them. OTOH, CPython does also have garbage collection to back up refcounting so that when you have unreferenced object cycles they don't stay around. I've repeatedly asked, both here and elsewhere, why reference counting isn't real garbage collection. Nobody has been able to give me a satisfactory answer. As far as I can tell, it's a bit of pretentiousness with no basis in objective fact. You'll notice that I was very careful to qualify my statement with as most people think of it. Also, because CPython has two different memory management mechanisms, refcounting and cycle detection, and the module that controls cycle detection is called gc, I think it's simpler to follow along with the Python docs -- and critically important to remind people that there are in fact two different systems. -- Aahz (a...@pythoncraft.com) * http://www.pythoncraft.com/ ...if I were on life-support, I'd rather have it run by a Gameboy than a Windows box. --Cliff Wells -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In article 4c78e7a5$0$28655$c3e8...@news.astraweb.com, Steven D'Aprano st...@remove-this-cybersource.com.au wrote: On the other hand, the reason that CPython still has reference counting is that the alternatives tried so far are unacceptably for non-threaded code. No, it's *a* reason, the other main reason being that refcounting is much easier for a random C library. -- Aahz (a...@pythoncraft.com) * http://www.pythoncraft.com/ ...if I were on life-support, I'd rather have it run by a Gameboy than a Windows box. --Cliff Wells -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
In article mailman.1967.1281549328.1673.python-l...@python.org, MRAB pyt...@mrabarnett.plus.com wrote: An object will be available for garbage collection when nothing refers to it either directly or indirectly. If it's unreferenced then it will go away. This isn't actually garbage collection as most people think of it. Refcounting semantics mean that objects get reaped as soon as nothing points at them. OTOH, CPython does also have garbage collection to back up refcounting so that when you have unreferenced object cycles they don't stay around. -- Aahz (a...@pythoncraft.com) * http://www.pythoncraft.com/ ...if I were on life-support, I'd rather have it run by a Gameboy than a Windows box. --Cliff Wells -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On 8/11/2010 1:26 PM, EW wrote: On Aug 11, 2:52 pm, Paul Rubinno.em...@nospam.invalid wrote: EWericwoodwo...@gmail.com writes: Well I cared because I thought garbage collection would only happen when the script ended - the entire script. Since I plan on running this as a service it'll run for months at a time without ending. So I thought I was going to have heaps of Queues hanging out in memory, unreferenced and unloved. It seemed like bad practice so I wanted to get out ahead of it. Even if GC worked that way it wouldn't matter, if you use just one queue per type of task. That number should be a small constant so the memory consumption is small. Well I can't really explain it but 1 Queue per task for what I'm designing just doesn't feel right to me. It feels like it will lack future flexibility. I like having 1 Queue per producer thread object and the person instantiating that object can do whatever he wants with that Queue. I can't prove I'll need that level of flexibility but I don't see why it' bad to have. It's still a small number of Queues, it's just a small, variable, number of Queues. That's backwards. Usually, you want one queue per unique consumer. That is, if you have a queue that contains one kind of request, there's one thread reading the queue, blocked until some other thread puts something on the queue. No polling is needed. One consumer reading multiple queues is difficult to implement well. Note, by the way, that CPython isn't really concurrent. Only one thread runs at a time, due to an archaic implementation. So if your threads are compute-bound, even on a multicore CPU threading will not help. There's a multiprocessing module which allows spreading work over several processes instead of threads. That can be helpful as a workaround. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On Fri, 27 Aug 2010 09:16:52 -0700, Aahz wrote: In article mailman.1967.1281549328.1673.python-l...@python.org, MRAB pyt...@mrabarnett.plus.com wrote: An object will be available for garbage collection when nothing refers to it either directly or indirectly. If it's unreferenced then it will go away. This isn't actually garbage collection as most people think of it. Refcounting semantics mean that objects get reaped as soon as nothing points at them. OTOH, CPython does also have garbage collection to back up refcounting so that when you have unreferenced object cycles they don't stay around. I've repeatedly asked, both here and elsewhere, why reference counting isn't real garbage collection. Nobody has been able to give me a satisfactory answer. As far as I can tell, it's a bit of pretentiousness with no basis in objective fact. http://en.wikipedia.org/wiki/Garbage_collection_(computer_science) http://en.wikipedia.org/wiki/Reference_counting Reference counting is one specific kind of garbage collection, and like all gc strategies, it has strengths as well as weaknesses. It is *simple* to implement (which may be why a certain class of programmer likes to think it isn't real gc). When it runs is deterministic, and is immediate upon the resource being freed. The overhead is very light (a plus) and continuous (which can be both a plus and a minus). It is better suited to managing scarce resources like open files than are tracing garbage collectors. It avoids the embarrassing pause of tracing collectors. It doesn't deal well with reference cycles, and (at least with Python's implementation of ref counting) it causes performance issues with threaded applications. http://en.wikipedia.org/wiki/Garbage_collection_(computer_science) http://en.wikipedia.org/wiki/Reference_counting So CPython has two garbage collectors -- a reference counting implementation, and a tracing implementation. Jython and IronPython use the native garbage collectors from Java and .Net. Other Pythons may use something else. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Steven D'Aprano st...@remove-this-cybersource.com.au writes: I've repeatedly asked, both here and elsewhere, why reference counting isn't real garbage collection. Nobody has been able to give me a satisfactory answer. As far as I can tell, it's a bit of pretentiousness with no basis in objective fact. Well, it's a bit of a subjective matter. I'd say it's not real gc because 1) it's unsound (misses reference cycles), and 2) it requires constant attention from the mutator to incr and decr the reference counts. So developing modules for the CPython API means endlessly finding and fixing refcount bugs that lead to either crashes/security failures, or memory leaks. If you program the Java JNI or a typical Lisp FFI, you'll find that real gc is a lot simpler to use since you avoid all the refcount maintenance hassles. You allocate memory and shut your eyes, and the gc takes care of freeing it when it figures out that you are done. Refcounting is basically a form of manual memory management, while gc is automatic. Someone said here recently that as a program gets larger, saying this will work as long as we do X every time without fail becomes equal to saying this won't work. Substitute properly maintain all ref counts for X and you can see the problem. I've seen released production tested Python C modules with subtle refcount bugs on more than one occasion. In gc'd systems there are fewer places for the code to go wrong. -- http://mail.python.org/mailman/listinfo/python-list
Queue cleanup
Hi I'm writing a multithreaded app that relies on Queues to move data between the threads. I'm trying to write my objects in a general way so that I can reuse them in the future so I need to write them in such a way that I don't know how many producer and how many consumer threads I might need. I also might have different consumer threads do different tasks (for example one might write to a log and one might write to SQL) so that again means I can't plan for a set ratio of consumers to producers. So it's unknown. So this means that instead of having 1 Queue that all the producers put to and that all the consumers get from I actually have 1 Queue per producer thread that the main body sends to the correct type of consumer thread. So I could get something like this where 3 producer threads write to 3 different Queues all of which get read by 1 consumer thread: P1P2 P3 \| / \ | / C1 So producers 1, 2, and 3 all write to individual Queues and consumer 1 had a list of those Queues and reads them all. The problem I'm having is that those producer threads can come and go pretty quickly and when they die I can cleanup the thread with join() but I'm still left with the Queue. So I could get something like this: P1 P3 \| / \ | / C1 So here the P2 thread has ended and gone away but I still have his Queue lingering. So on a thread I can use is_alive() to check status and use join() to clean up but I don't see any analogous functionality for Queues. How do I kill them? I thought about putting a suicide message on the Queue and then C1 would read it and set the variable to None but i'm not sure setting the variable to None actually makes the Queue go away. It could just end up sitting in memory unreferenced - and that's not good. Additionally, I could have any number of consumer threads reading that Queue so once the first one get the suicide note the other consumer threads never would. I figure there has to be an elegant way for managing my Queues but so far I can't find it. Any suggestions would be appreciated and thanks in advance for any help. ps Python rocks. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On Aug 11, 12:55 pm, EW ericwoodwo...@gmail.com wrote: Hi I'm writing a multithreaded app that relies on Queues to move data between the threads. I'm trying to write my objects in a general way so that I can reuse them in the future so I need to write them in such a way that I don't know how many producer and how many consumer threads I might need. I also might have different consumer threads do different tasks (for example one might write to a log and one might write to SQL) so that again means I can't plan for a set ratio of consumers to producers. So it's unknown. So this means that instead of having 1 Queue that all the producers put to and that all the consumers get from I actually have 1 Queue per producer thread that the main body sends to the correct type of consumer thread. So I could get something like this where 3 producer threads write to 3 different Queues all of which get read by 1 consumer thread: P1 P2 P3 \ | / \ | / C1 So producers 1, 2, and 3 all write to individual Queues and consumer 1 had a list of those Queues and reads them all. The problem I'm having is that those producer threads can come and go pretty quickly and when they die I can cleanup the thread with join() but I'm still left with the Queue. So I could get something like this: P1 P3 \ | / \ | / C1 So here the P2 thread has ended and gone away but I still have his Queue lingering. So on a thread I can use is_alive() to check status and use join() to clean up but I don't see any analogous functionality for Queues. How do I kill them? I thought about putting a suicide message on the Queue and then C1 would read it and set the variable to None but i'm not sure setting the variable to None actually makes the Queue go away. It could just end up sitting in memory unreferenced - and that's not good. Additionally, I could have any number of consumer threads reading that Queue so once the first one get the suicide note the other consumer threads never would. I figure there has to be an elegant way for managing my Queues but so far I can't find it. Any suggestions would be appreciated and thanks in advance for any help. ps Python rocks. Whoo..the formatting got torn up! My terrible diagrams are even more terrible! Oh well, I think you'll catch my meaning :) -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
EW ericwoodwo...@gmail.com writes: I also might have different consumer threads do different tasks (for example one might write to a log and one might write to SQL) so that again means I can't plan for a set ratio of consumers to producers So it's unknown. So this means that instead of having 1 Queue that all the producers put to and that all the consumers get from I actually have 1 Queue per producer thread That doesn't sound appropriate. Queues can have many readers and many writers. So use one queue per task (logging, SQL, etc), regardless of the number of producer or consumer threads. Any producer with an SQL request sends it to the SQL queue, which can have many listeners. The different SQL consumer threads listen to the SQL queue and pick up requests and handle them. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On Aug 11, 1:18 pm, Paul Rubin no.em...@nospam.invalid wrote: EW ericwoodwo...@gmail.com writes: I also might have different consumer threads do different tasks (for example one might write to a log and one might write to SQL) so that again means I can't plan for a set ratio of consumers to producers So it's unknown. So this means that instead of having 1 Queue that all the producers put to and that all the consumers get from I actually have 1 Queue per producer thread That doesn't sound appropriate. Queues can have many readers and many writers. So use one queue per task (logging, SQL, etc), regardless of the number of producer or consumer threads. Any producer with an SQL request sends it to the SQL queue, which can have many listeners. The different SQL consumer threads listen to the SQL queue and pick up requests and handle them. I thought about doing it that way and I could do it that way but it still seems like there should be a way to clean up Queues on my own. If I did it this way then I guess I'd be relying on garbage collection when the script ended to clean up the Queues for me. What if I want to clean up my own Queues? Regardless of the specifics of my current design, I'm just generally curious how people manage cleanup of their Queues when they don't want them any more. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
EW wrote: [snip] So here the P2 thread has ended and gone away but I still have his Queue lingering. So on a thread I can use is_alive() to check status and use join() to clean up but I don't see any analogous functionality for Queues. How do I kill them? I thought about putting a suicide message on the Queue and then C1 would read it and set the variable to None but i'm not sure setting the variable to None actually makes the Queue go away. It could just end up sitting in memory unreferenced - and that's not good. Additionally, I could have any number of consumer threads reading that Queue so once the first one get the suicide note the other consumer threads never would. I figure there has to be an elegant way for managing my Queues but so far I can't find it. Any suggestions would be appreciated and thanks in advance for any help. An object will be available for garbage collection when nothing refers to it either directly or indirectly. If it's unreferenced then it will go away. As for the suicide note, if a consumer sees it then it can put it back into the queue so other consumers will see it and then forget about the queue (set the variable which refers to the queue to None, or, if the references are in a list, delete it from the list). -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On Aug 11, 1:55 pm, MRAB pyt...@mrabarnett.plus.com wrote: EW wrote: [snip] So here the P2 thread has ended and gone away but I still have his Queue lingering. So on a thread I can use is_alive() to check status and use join() to clean up but I don't see any analogous functionality for Queues. How do I kill them? I thought about putting a suicide message on the Queue and then C1 would read it and set the variable to None but i'm not sure setting the variable to None actually makes the Queue go away. It could just end up sitting in memory unreferenced - and that's not good. Additionally, I could have any number of consumer threads reading that Queue so once the first one get the suicide note the other consumer threads never would. I figure there has to be an elegant way for managing my Queues but so far I can't find it. Any suggestions would be appreciated and thanks in advance for any help. An object will be available for garbage collection when nothing refers to it either directly or indirectly. If it's unreferenced then it will go away. As for the suicide note, if a consumer sees it then it can put it back into the queue so other consumers will see it and then forget about the queue (set the variable which refers to the queue to None, or, if the references are in a list, delete it from the list). Ok great. I wasn't sure about the Garbage collection part of it. That's actually pretty easy. Thanks! -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
EW ericwoodwo...@gmail.com writes: I thought about doing it that way and I could do it that way but it still seems like there should be a way to clean up Queues on my own. If I did it this way then I guess I'd be relying on garbage collection when the script ended to clean up the Queues for me. Oh, I see. As long as it's possible to start new producer or consumer threads that touch a queue, obviously that queue has to still be around. If the program starts all its threads at the beginning, then runs til they exit, then does more stuff, then you could do something like: # make dictonary of queues, one queue per task type queues = {'sql': Queue(), 'logging': Queue(), ... } for i in whatever threads you want threading.Thread(target=your_handler, args=[queues]) del queues and then when all the threads exit, there are no remaining references to the queues. But why do you care? Queues aren't gigantic structures, they're just a list (collections.deque) with an rlock. It's fine to let the gc clean them up; that's the whole point of having a gc in the first place. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On Aug 11, 2:16 pm, Paul Rubin no.em...@nospam.invalid wrote: EW ericwoodwo...@gmail.com writes: I thought about doing it that way and I could do it that way but it still seems like there should be a way to clean up Queues on my own. If I did it this way then I guess I'd be relying on garbage collection when the script ended to clean up the Queues for me. Oh, I see. As long as it's possible to start new producer or consumer threads that touch a queue, obviously that queue has to still be around. If the program starts all its threads at the beginning, then runs til they exit, then does more stuff, then you could do something like: # make dictonary of queues, one queue per task type queues = {'sql': Queue(), 'logging': Queue(), ... } for i in whatever threads you want threading.Thread(target=your_handler, args=[queues]) del queues and then when all the threads exit, there are no remaining references to the queues. But why do you care? Queues aren't gigantic structures, they're just a list (collections.deque) with an rlock. It's fine to let the gc clean them up; that's the whole point of having a gc in the first place. Well I cared because I thought garbage collection would only happen when the script ended - the entire script. Since I plan on running this as a service it'll run for months at a time without ending. So I thought I was going to have heaps of Queues hanging out in memory, unreferenced and unloved. It seemed like bad practice so I wanted to get out ahead of it. But the GC doesn't work the way I thought it worked so there's really no problem I guess. I was just confused on garbage collection it seems. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
EW ericwoodwo...@gmail.com writes: Well I cared because I thought garbage collection would only happen when the script ended - the entire script. Since I plan on running this as a service it'll run for months at a time without ending. So I thought I was going to have heaps of Queues hanging out in memory, unreferenced and unloved. It seemed like bad practice so I wanted to get out ahead of it. Even if GC worked that way it wouldn't matter, if you use just one queue per type of task. That number should be a small constant so the memory consumption is small. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
Paul Rubin wrote: EW ericwoodwo...@gmail.com writes: Well I cared because I thought garbage collection would only happen when the script ended - the entire script. Since I plan on running this as a service it'll run for months at a time without ending. So I thought I was going to have heaps of Queues hanging out in memory, unreferenced and unloved. It seemed like bad practice so I wanted to get out ahead of it. Even if GC worked that way it wouldn't matter, if you use just one queue per type of task. That number should be a small constant so the memory consumption is small. That's basically how _non_-garbage-collected languages work! :-) -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
On Aug 11, 2:52 pm, Paul Rubin no.em...@nospam.invalid wrote: EW ericwoodwo...@gmail.com writes: Well I cared because I thought garbage collection would only happen when the script ended - the entire script. Since I plan on running this as a service it'll run for months at a time without ending. So I thought I was going to have heaps of Queues hanging out in memory, unreferenced and unloved. It seemed like bad practice so I wanted to get out ahead of it. Even if GC worked that way it wouldn't matter, if you use just one queue per type of task. That number should be a small constant so the memory consumption is small. Well I can't really explain it but 1 Queue per task for what I'm designing just doesn't feel right to me. It feels like it will lack future flexibility. I like having 1 Queue per producer thread object and the person instantiating that object can do whatever he wants with that Queue. I can't prove I'll need that level of flexibility but I don't see why it' bad to have. It's still a small number of Queues, it's just a small, variable, number of Queues. -- http://mail.python.org/mailman/listinfo/python-list
Re: Queue cleanup
EW ericwoodwo...@gmail.com writes: Well I can't really explain it but 1 Queue per task for what I'm designing just doesn't feel right to me. It feels like it will lack future flexibility. That makes no sense at all. Multiple readers and writers per queue are the way Python queues are designed to work. The normal way to spray a bunch of concurrent tasks to worker threads is just have a bunch of workers listening to one queue. It's the same way at the producer end. -- http://mail.python.org/mailman/listinfo/python-list