subject:"Queue cleanup"

In message 7xvd6sv0n4@ruckus.brouhaha.com, Paul Rubin wrote:

 Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes:
  AddrObj = PyTuple_GetItem(TheBufferInfo, 0);
  LenObj = PyTuple_GetItem(TheBufferInfo, 1);
 
 the first PyTuple_GetItem succeeds and the second one fails.

 Admittedly, I did take a shortcut here: array.buffer_info returns a tuple
 of two items, so I’m not expecting one GetItem to succeed and the other
 to fail.
 
 FromArray is a parameter to the function, with no type check to make
 sure it's really an array.  In fact your code allows for the possibility
 that it doesn't support the buffer_info operation (if I understand the
 purpose of the null return check after the PyObject_CallMethod) which
 means it's prepared for the argument to -not- be an array.

That reinforces my point, about how easy it was to check the correctness of 
the code. In this case one simple fix, like this

diff --git a/spuhelper.c b/spuhelper.c
index 83fd4eb..2ba8197 100644
--- a/spuhelper.c
+++ b/spuhelper.c
@@ -151,10 +151,12 @@ static void GetBufferInfo
if (TheBufferInfo == 0)
break;
AddrObj = PyTuple_GetItem(TheBufferInfo, 0);
-   LenObj = PyTuple_GetItem(TheBufferInfo, 1);
if (PyErr_Occurred())
break;
Py_INCREF(AddrObj);
+   LenObj = PyTuple_GetItem(TheBufferInfo, 1);
+   if (PyErr_Occurred())
+   break;
Py_INCREF(LenObj);
*addr = PyInt_AsUnsignedLongMask(AddrObj);
*len = PyInt_AsUnsignedLongMask(LenObj);

would render the code watertight. See how easy it is?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

In message 7xiq2que93@ruckus.brouhaha.com, Paul Rubin wrote:

 Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes:

 Refcounting is susceptable to the same pauses for reasons already
 discussed.

 Doesn’t seem to happen in the real world, though.
 
 def f(n):
 from time import time
 a = [1] * n
 t0 = time()
 del a
 t1 = time()
 return t1 - t0
 
 for i in range(9):
print i, f(10**i)
 
 
 on my system prints:
 
 0 2.86102294922e-06
 1 2.14576721191e-06
 2 3.09944152832e-06
 3 1.00135803223e-05
 4 0.000104904174805
 5 0.00098991394043
 6 0.00413608551025
 7 0.037693977356
 8 0.362598896027
 
 Looks pretty linear as n gets large.  0.36 seconds (the last line) is a
 noticable pause.

Which just proves the point. You had to deliberately set up the situation to 
make that happen. And it remains just as easy to pinpoint where it is 
happening, so you can control it.

With a garbage collector, you don’t have that control. Even if you try to 
avoid freeing a single large structure at once, it’s still liable to batch 
up a lot of small objects to free at once, so the problem can still happen.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

In message 7xmxs2uez1@ruckus.brouhaha.com, Paul Rubin wrote:

 Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes:

 Whereas garbage collection will happen at some indeterminate time long
 after the last access to the object, when it very likely will no longer
 be in the cache, and have to be brought back in just to be freed,
 
 GC's for large systems generally don't free (or even examine) individual
 garbage objects.  They copy the live objects to a new contiguous heap
 without ever touching the garbage, and then they release the old heap.

And suddenly you’ve doubled the memory requirements. And on top of that, 
since you’re moving the valid objects into different memory, you’re forcing 
cache misses on all of them as well.

This is the continuing problem with garbage collection: all the attempts to 
make it cheaper just end up moving the costs somewhere else.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

In message 7xr5heufhb@ruckus.brouhaha.com, Paul Rubin wrote:

 Java has considerably greater reputation for reliability than C or C++.

Wonder why Sun’s licence explicitly forbade its use in danger-critical areas 
like nuclear power plants and the like, then?

 Ada is a different story, but Ada programs (because of the application
 area Ada is used in) tend not to use a lot of dynamic memory allocation
 in the first place.  A little googling shows there are GC extensions
 available for Ada, though I don't know if they are used much.

Let’s put it this way: the life-support system on the International Space 
Station is written in Ada. Would you trust your life to code written in 
Java?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-09-03 Thread Paul Rubin

Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes:
 Java has considerably greater reputation for reliability than C or C++.

 Wonder why Sun’s licence explicitly forbade its use in danger-critical
 areas like nuclear power plants and the like, then?

Probably because Sun lawyers demanded it.  Is there a Sun C or C++
compiler with a license that doesn't have that restriction?  Even if
there is, it just means those languages are so unreliable that the
lawyers felt confident that any meltdown could be blamed on a bug in the
user's rather than the compiler ;-).

 Let’s put it this way: the life-support system on the International Space 
 Station is written in Ada. Would you trust your life to code written in 
 Java?

The scary thing is I don't know whether I'm already doing that.  Life
support systems have hard real-time requirements (Ada's forte) but I'd
expect lots of military decision-support systems are written in Java.
Maybe one of them will raise a false alert and somebody will launch a
war.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-09-03 Thread MRAB


On 04/09/2010 03:21, Paul Rubin wrote:

Lawrence D'Oliveirol...@geek-central.gen.new_zealand  writes:

Java has considerably greater reputation for reliability than C or C++.


Wonder why Sun’s licence explicitly forbade its use in danger-critical
areas like nuclear power plants and the like, then?


Probably because Sun lawyers demanded it.  Is there a Sun C or C++
compiler with a license that doesn't have that restriction?  Even if
there is, it just means those languages are so unreliable that the
lawyers felt confident that any meltdown could be blamed on a bug in the
user's rather than the compiler ;-).


Let’s put it this way: the life-support system on the International Space
Station is written in Ada. Would you trust your life to code written in
Java?


The scary thing is I don't know whether I'm already doing that.  Life
support systems have hard real-time requirements (Ada's forte) but I'd
expect lots of military decision-support systems are written in Java.
Maybe one of them will raise a false alert and somebody will launch a
war.


I thought it was just that if it wasn't explicitly forbidden then
someone might try to use it and then sue if something went wrong, even
though common sense would have said that it was a bad idea in the first
place! :-)
--
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-09-03 Thread Paul Rubin

Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes:
 GC's for large systems generally don't free (or even examine) individual
 garbage objects.  They copy the live objects to a new contiguous heap
 without ever touching the garbage, and then they release the old heap.

 And suddenly you’ve doubled the memory requirements. And on top of that, 
 since you’re moving the valid objects into different memory, you’re forcing 
 cache misses on all of them as well.

A minimal naive implementation indeed doubles the memory requirements,
but from a Python perspective where every integer takes something like
24 bytes already, even that doesn't seem so terrible.  More
sophisticated implementations use multiple small heaps or other tricks.
It still costs something in memory footprint, but less than the minimal
implementation's 2x cost.

The new heap is filled sequentially so accesses to it will have good
locality.  You do have to jump around within the old heap, but again,
with generational schemes, in the more frequent collections, the old
heap fits entirely in cache.  For example, GHC's minor heap size is
256kB.  For major collections, GHC switches (or used to) from copying to
a mark/compact scheme once the amount of live data in the heap goes over
some amount, giving the best of both worlds.

It's also the case that programs with very large memory consumption tend
to use most of the memory for large arrays that don't contain pointers
(think of a database server with a huge cache).  That means the gc
doesn't really have to think about all that much of the memory.

 This is the continuing problem with garbage collection: all the attempts to 
 make it cheaper just end up moving the costs somewhere else.

Same thing with manual allocation.  That moves the costs off the
computer and onto the programmer.  Not good, most of the time.

Really, I'm no gc expert, but the stuff you're saying about gc is quite
ill-informed.  You might want to check out some current literature.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-09-02 Thread John Nagle


On 8/30/2010 12:22 AM, Paul Rubin wrote:

I guess that is how the so-called smart pointers in the Boost C++
template library work.  I haven't used them so I don't have personal
experience with how convenient or reliable they are, or what kinds of
constraints they imposed on programming style.  I've always felt a bit
suspicious of them though, and I seem to remember Alex Martelli (I hope
he shows up here again someday) advising against using them.


   Smart pointers in C++ have never quite worked right.  They
almost work.  But there always seems to be something that needs
access to a raw C pointer, which breaks the abstraction.
The mold keeps creeping through the wallpaper.

   Also, since they are a bolt-on at the macro level in C++,
reference count updates aren't optimized and hoisted out of
loops.  (They aren't in CPython either, but there have been reference
counted systems that optimize out most reference count updates.)

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-09-02 Thread Paul Rubin

Dennis Lee Bieber wlfr...@ix.netcom.com writes:
 GC's for large systems ... copy the live objects to a new contiguous heap

   That sounds suspiciously like the original Macintosh OS, with its
 handles... IE, double-indirection. 

Nah, a double indirection on every access would be a terrible
performance hit.  The classic approach is when you move an object to the
new heap, you leave a tagged forwarding pointer at its former location
the old heap, giving the its location in the new heap.  Then as you move
other objects, you dereference the pointers in them to see whether they
point to moved or unmoved objects, and relocate any unmoved ones.  A
more complete explanation is here:

http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-33.html#%_sec_5.3.2 
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-09-01 Thread Paul Rubin

Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes:
 And yet Java code, for example, doesn’t have a reputation for greater 
 reliability compared to, say code written in Ada or C++, or even C. What is 
 the Java runtime written in? C. Why not use Java, if there is no speed 
 penalty in doing so?

The Java runtime (such as the garbage collector) needs untyped pointers
and can't be written in Java.  It might be possible to write a type-safe
GC in something like ATS, which has extremely precise types, but that is
almost alien technology by comparison to writing in C.

Java has considerably greater reputation for reliability than C or C++.
Ada is a different story, but Ada programs (because of the application
area Ada is used in) tend not to use a lot of dynamic memory allocation
in the first place.  A little googling shows there are GC extensions
available for Ada, though I don't know if they are used much.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-09-01 Thread Paul Rubin

Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes:
 Whereas garbage collection will happen at some indeterminate time long after 
 the last access to the object, when it very likely will no longer be in the 
 cache, and have to be brought back in just to be freed, 

GC's for large systems generally don't free (or even examine) individual
garbage objects.  They copy the live objects to a new contiguous heap
without ever touching the garbage, and then they release the old heap.
That has the effect of improving locality, since the new heap is
compacted and has no dead objects.  The algorithms are generational
(they do frequent gc's on recently-created objects and less frequent
ones on older objects), so minor gc operations are on regions that fit
in cache, while major ones might have cache misses but are infrequent.

Non-compacting reference counting (or simple mark/sweep gc) has much
worse fragmentation and consequently worse cache locality than
copying-style gc.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-09-01 Thread Paul Rubin

Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes:
 Refcounting is susceptable to the same pauses for reasons already
 discussed.

 Doesn’t seem to happen in the real world, though.

def f(n):
from time import time
a = [1] * n
t0 = time()
del a
t1 = time()
return t1 - t0

for i in range(9):
   print i, f(10**i)


on my system prints:

0 2.86102294922e-06
1 2.14576721191e-06
2 3.09944152832e-06
3 1.00135803223e-05
4 0.000104904174805
5 0.00098991394043
6 0.00413608551025
7 0.037693977356
8 0.362598896027

Looks pretty linear as n gets large.  0.36 seconds (the last line) is a
noticable pause.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-31 Thread Lawrence D'Oliveiro

In message 7xtymbzixt@ruckus.brouhaha.com, Paul Rubin wrote:

 It's pretty well established by now that GC doesn't have any significant
 speed penalty compared with manual allocation.  It does consume more
 memory, which is acceptable a lot of the time.  It certainly leads to
 more reliable programs.

And yet Java code, for example, doesn’t have a reputation for greater 
reliability compared to, say code written in Ada or C++, or even C. What is 
the Java runtime written in? C. Why not use Java, if there is no speed 
penalty in doing so?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-31 Thread Lawrence D'Oliveiro

In message 7x39tz42fd@ruckus.brouhaha.com, Paul Rubin wrote:

 Dennis Lee Bieber wlfr...@ix.netcom.com writes:

 Heap marking, OTOH, tends to run at indeterminate times, which could
 have an impact if one needs predictable response timings
 
 Reference counting has the same problem.  If you drop the last reference
 to a complex structure, it could take quite a long time to free all the
 components.

One difference is the interaction with caching behaviour. When a reference-
counted object is freed, the odds are that happens fairly soon after the 
last access, so the object will still be in the CPU cache, and access will 
be fast.

Whereas garbage collection will happen at some indeterminate time long after 
the last access to the object, when it very likely will no longer be in the 
cache, and have to be brought back in just to be freed, quite likely bumping 
out something else that the running program needs to access.

This is one reason why garbage collection is still considered an expensive 
technique. Computing power has improved by something like five orders of 
magnitude over the last half-century, making possible all kinds of 
productivity-enhancing techniques that were once considered expensive to 
become commonplace: high-level languages, dynamic memory allocation, stacks, 
hardware floating-point, memory protection and so on.

But alone of all of these, garbage collection still remains just as costly 
to implement as ever. That should tell you something about how poorly it 
matches the characteristics of modern computing hardware.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-31 Thread Lawrence D'Oliveiro

In message 7xmxs4uzjl@ruckus.brouhaha.com, Paul Rubin wrote:

 Gregory Ewing greg.ew...@canterbury.ac.nz writes:

 I'd be disappointed if CPython ditched refcounting and
 then became unsuitable for real-time games as a result.
 
 Refcounting is susceptable to the same pauses for reasons already
 discussed.

Doesn’t seem to happen in the real world, though.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes:
 static void GetBufferInfo
   ( ...
 do /*once*/
   {
 TheBufferInfo = PyObject_CallMethod(FromArray, buffer_info, );
 if (TheBufferInfo == 0)
 break;
 AddrObj = PyTuple_GetItem(TheBufferInfo, 0);
 LenObj = PyTuple_GetItem(TheBufferInfo, 1);
 if (PyErr_Occurred())
 break;
 ...
 Py_INCREF(AddrObj);
 Py_INCREF(LenObj);
   }
 while (false);
 Py_XDECREF(AddrObj);
 Py_XDECREF(LenObj);
 Py_XDECREF(TheBufferInfo);
   } /*GetBufferInfo*/

 It’s quite easy to assure yourself that this is never going to leak memory. 

Actually that code looks suspicious.  Suppose in

 AddrObj = PyTuple_GetItem(TheBufferInfo, 0);
 LenObj = PyTuple_GetItem(TheBufferInfo, 1);

the first PyTuple_GetItem succeeds and the second one fails.  Then
AddrObj is a borrowed reference to the first tuple element and LenObj is
null, the error flag is set, so you break out of the do/while.  You then
decrement the refcount of AddrObj even though you didn't increment it.
Maybe there's an explanation that makes it ok somehow, but it still
looks messy.  This is the kind of problem I'm referring to in general.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

Steven D'Aprano steve-remove-t...@cybersource.com.au writes:
 I'm not saying that ref counting systems can avoid incrementing and 
 decrementing the ref counts. That would be silly. But I'm saying that it 
 is an accident of implementation that writing C extensions requires you 
 to manually manage the counts yourself. That's a side-effect of 
 permitting coders to write C extensions in pure C, rather than in some 
 intermediate language which handles the ref counts for you but otherwise 
 compiles like C. If people cared enough, they could (probably) make the C 
 compiler handle it for them, just as it currently handles incrementing 
 and decrementing the return stack. 

I guess that is how the so-called smart pointers in the Boost C++
template library work.  I haven't used them so I don't have personal
experience with how convenient or reliable they are, or what kinds of
constraints they imposed on programming style.  I've always felt a bit
suspicious of them though, and I seem to remember Alex Martelli (I hope
he shows up here again someday) advising against using them.

I don't think a C compiler could really manage automatic decrementing
while still being C.  Think especially of the common style of exception
handling in C using longjmp.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-30 Thread Steven D'Aprano

On Mon, 30 Aug 2010 00:22:17 -0700, Paul Rubin wrote:

 I don't think a C compiler could really manage automatic decrementing
 while still being C.  Think especially of the common style of exception
 handling in C using longjmp.

You might very well be right. But that's the problem with C -- it's too 
low a level language to expect the compiler to protect you much. Or at 
all. There will always be some use cases for managing memory yourself, or 
even managing the return stack (as you can do in Forth, for example), and 
so there will always need to be some sort of high-level assembler like C. 
But it astounds me that in 2010 people still routinely use C for normal, 
everyday application programming.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-30 Thread Gregory Ewing


Paul Rubin wrote:


These days I think the GC pause issue is overrated except for real-time
control applications.


Also for games, which are a fairly common application
these days. Even a few milliseconds can be too long when
you're trying to achieve smooth animation.

I'd be disappointed if CPython ditched refcounting and
then became unsuitable for real-time games as a result.

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-30 Thread Lawrence D'Oliveiro

In message 7xr5hg3a7s@ruckus.brouhaha.com, Paul Rubin wrote:

 Actually that code looks suspicious.  Suppose in
 
  AddrObj = PyTuple_GetItem(TheBufferInfo, 0);
  LenObj = PyTuple_GetItem(TheBufferInfo, 1);
 
 the first PyTuple_GetItem succeeds and the second one fails.

Admittedly, I did take a shortcut here: array.buffer_info returns a tuple of 
two items, so I’m not expecting one GetItem to succeed and the other to 
fail.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-30 Thread Lawrence D'Oliveiro

In message 7x39twpuxi@ruckus.brouhaha.com, Paul Rubin wrote:

 Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes:

 the CPython API means endlessly finding and fixing refcount bugs that
 lead to either crashes/security failures, or memory leaks.

 I don’t see why that should be so. It seems a very simple discipline to
 follow: initialize, allocate, free. Here’s an example snippet from my DVD
 Menu Animator http://github.com/ldo/dvd_menu_animator:
 
 In practice it has been a problem.

Maybe people need to learn to write code in a more structured fashion.

 If it hasn't happened to you yet, you're either burning a bunch of effort
 that programmers of more automatic systems can put to more productive
 uses ...

What makes you say that? Avoiding bugs is not a “productive use”?

 And how do you run such an application? You have to limit it to a
 predetermined amount of memory to begin with, otherwise it would easily
 gobble up everything you have.
 
 No that's usually not a problem-- the runtime system (generational gc)
 can figure out enough from your allocation pattern to prevent the heap
 from getting overlarge.

And yet Java apps, for example, are (in)famous for excessive memory usage 
compared to those written in non-GC-dependent languages.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes:
  AddrObj = PyTuple_GetItem(TheBufferInfo, 0);
  LenObj = PyTuple_GetItem(TheBufferInfo, 1);
 
 the first PyTuple_GetItem succeeds and the second one fails.

 Admittedly, I did take a shortcut here: array.buffer_info returns a tuple of 
 two items, so I’m not expecting one GetItem to succeed and the other to 
 fail.

FromArray is a parameter to the function, with no type check to make
sure it's really an array.  In fact your code allows for the possibility
that it doesn't support the buffer_info operation (if I understand the
purpose of the null return check after the PyObject_CallMethod) which
means it's prepared for the argument to -not- be an array.  In that case
maybe it's some other object with a buffer_info operation that returns
a 1-element tuple.  If the function is callable from Python code, then
that arg type is completely out of the C code's control.  Even if it's
only callable from C, you're still depending on not one but two
different invariants (that the arg is an array, and that
array.buffer_info returns a 2-tuple) that are undocumented and unchecked
in the function.  I cannot agree with your claim that the approach
scales.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes:
 If it hasn't happened to you yet, you're either burning a bunch of effort
 that programmers of more automatic systems can put to more productive
 uses ...

 What makes you say that? Avoiding bugs is not a “productive use”?

Avoiding any particular bug through constant and pervasive vigilance is
far less productive than using a system where causing that particular
type of bug is impossible to begin with.  IMO the code you posted has
latent bugs as discussed in the other post.  It might work at the moment
you checked it in but it is brittle.  I wouldn't have signed off on it
in a code review.

 And yet Java apps, for example, are (in)famous for excessive memory
 usage compared to those written in non-GC-dependent languages.

I think that may mostly be an issue with the bloated nature of most Java
apps.  Certainly Lisp systems have run in production for decades on
machines with much less memory than we would consider acceptable these
days for any substantial Python app.

It's probably true that gc copying-style gc is more memory hungry than
Python's refcount system.  Mark-sweep gc should have comparable memory
consumption and better speed than refcounting, though less speed than
copying.

JHC (experimental Haskell compiler) recently started using a mixture of
gc and region inference.  It will be interesting to see how that works
out.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

Gregory Ewing greg.ew...@canterbury.ac.nz writes:
 These days I think the GC pause issue is overrated except for real-time
 control applications.

 Also for games, which are a fairly common application
 these days. Even a few milliseconds can be too long when
 you're trying to achieve smooth animation.

The usual hack with games is you do a major gc when the user advances
between game levels.  You can do minor gc's during the screen refresh
interval.

 I'd be disappointed if CPython ditched refcounting and
 then became unsuitable for real-time games as a result.

Refcounting is susceptable to the same pauses for reasons already
discussed.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-30 Thread Aahz

In article 4c7b279d$0$28650$c3e8...@news.astraweb.com,
Steven D'Aprano  steve-remove-t...@cybersource.com.au wrote:
On Sun, 29 Aug 2010 17:52:38 -0700, Paul Rubin wrote:
Attribution lost:

 That's a problem with the CPython API, not reference counting. The
 problem is that the CPython API is written at too low a level, beneath
 that at which the garbage collector exists, so naturally you have to
 manually manage memory.
 
 Care to give an example of a reference counted system that's written any
 other way?

The complexity of the ref counter is invisible when writing pure Python 
code, and I believe it is also invisible when writing code in Cython. The 
difficulty of dealing with ref counts is abstracted away.

That's not completely true.  You know perfectly well that it's almost
trivially easy to leak memory with refcounting, and there are certain
ways in which Python leaks memory invisibly if you don't know how it
works.  One recent example at work was when someone was arguing with me
about whether we were leaking file handles and I had to prove that you
could leak file handles if you didn't clean up exceptions -- but that
cleaning up the exception *did* close the file handles.

(This code was originally written in Python 2.4, and partly because of
this we are making more of a push to use with)

If you're restricting your claim just to the actual management of the
reference counter, you're correct, but it's especially not clear that
your second sentence is so restricted.
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

...if I were on life-support, I'd rather have it run by a Gameboy than a
Windows box.  --Cliff Wells
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-30 Thread Lawrence D'Oliveiro

In message 7xr5hguzzi@ruckus.brouhaha.com, Paul Rubin wrote:

 JHC (experimental Haskell compiler) recently started using a mixture of
 gc and region inference.  It will be interesting to see how that works
 out.

That’s what people have been saying about garbage collection for about half 
a century: “this new experimental technique will solve all the problems, 
just you wait and see”.

Meanwhile, real-world programmers get on to writing real-world code that is 
productive and efficient on real-world systems.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes:
 Meanwhile, real-world programmers get on to writing real-world code that is 
 productive and efficient on real-world systems.

It's pretty well established by now that GC doesn't have any significant
speed penalty compared with manual allocation.  It does consume more
memory, which is acceptable a lot of the time.  It certainly leads to
more reliable programs.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-29 Thread Hans Mulder


Steven D'Aprano wrote:

On Sat, 28 Aug 2010 00:33:10 -0700, Paul Rubin wrote:



If you drop the last reference
to a complex structure, it could take quite a long time to free all the
components.  By contrast there are provably real-time tracing gc
schemes, including some parallelizeable ones.


I could be wrong, but how can they not be subject to the same performance 
issue? If you have twenty thousand components that all have to be freed, 
they all have to be freed whether you do it when the last reference is 
cleared, or six seconds later when the gc does a sweep.


Parallelizable garbage collectors have performance issues, but they're
not the same issues as marksweep collectors have.  Parallelizable GCs
break up their work in a zillion little pieces and allow the VM to do
some real work after each piece.  They won't free your twenty thousand
components all in one go and you won't have that embarrassing pause.

Parallelizable garbage collectors require some careful coordination
between the GC and the VM.  This takes CPU time, so on the whole they're
slower than traditional garbage collectors.  So instead of unpredictable
embarrassing pauses, you have a VM that's consistently slow.
For some applications consistency is more important than raw speed and
for these applications parallelizeable GCs are an improvement.

HTH,

-- HansM
--
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

Hans Mulder han...@xs4all.nl writes:
 Parallelizable garbage collectors have performance issues, but they're
 not the same issues as marksweep collectors have.  Parallelizable GCs
 break up their work in a zillion little pieces and allow the VM to do
 some real work after each piece.  They won't free your twenty thousand
 components all in one go and you won't have that embarrassing pause.

Quibble: parallel GC just means any GC that runs in multiple threads
simultaneously.  A GC that guarantees no pauses (more technically,
specifies a constant such that any GC pause is guaranteed to be shorter
than the constant) is called a real time GC, and real-time GC's are
usually single threaded.  Parallel real-time GC's were once sort of a
holy grail, though workable ones have been implemented.  GHC for example
currently uses a GC that is parallel (runs on multiple cores for speed)
but is not real-time (there can be a pause), and I think the Sun JVM is
the same way.

These days I think the GC pause issue is overrated except for real-time
control applications.  GC's no longer frequently make the program freeze
up for seconds at a time or anything like that.  It was worse in the old
days when CPU's were slower and memory was scarcer.  Serious GC's are
usually generational, with minor GC's taking microseconds and less
frequent major GC's taking fractions of a second.  So in an
interactive program or animation on your desktop you might notice a
little hiccup once in a while.  For something like a web server an
occasional slight variation in response time could easily be random
network delays and so forth.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

Steven D'Aprano st...@remove-this-cybersource.com.au writes:
 You can add cycle detection to a reference count gc, at the cost of more 
 complexity.

But then it's not purely a refcount gc. ;)

 If you read the Wikipedia article I linked to, tracing algorithms can 
 also be unsound:  [describes conservative gc]

Yeah, whether that counts as a real GC is subjective too.

 and 2) it requires constant attention from the mutator to incr and
 decr the reference counts.
 Yes. And?

I thought I made it clear, the purpose of gc is to improve programmer
productivity and program reliability by relieving the programmer from
the burden of memory allocation bookkeeping.  If the programmer has to
painstakingly manage refcounts by hand and the resulting programs are
still prone to refcount bugs (both of which are the case with CPython),
it's not really gc in a way that lives up to the name.

 That's a problem with the CPython API, not reference counting. The 
 problem is that the CPython API is written at too low a level, beneath 
 that at which the garbage collector exists, so naturally you have to 
 manually manage memory.

Care to give an example of a reference counted system that's written any
other way?

 On the other hand, tracing gcs have their own set of problems too, mostly 
 due to the use of finalizers and attempts to make garbage collection run 
 more predictably. See here:

I think it's fairly common wisdom that finalizers and gc don't interact
very well.  Finalizers in general aren't in the gc spirit, which says
the system should give the illusion that every object stays around
forever.  As for stuff like hash tables, a usual way to help out the gc
is by populating the hash table with weak references, rather than by
clearing it out manually when you're done with it.  It's also possible
for fancy compilers to use a mixture of gc and stack- or region-based
allocation.

 Tracing garbage collectors aren't a panacea. They're software themselves, 
 and complex software, which means they're subject to bugs like the one ...

You could say the same thing about compilers instead of gc's.  The idea
in both cases is yes, they're complex, but they put the complexity in
one place, so that the application program can rely on the complicated
gc or compiler features while itself staying simple.  
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

Steven D'Aprano st...@remove-this-cybersource.com.au writes:
 I could be wrong, but how can they not be subject to the same performance 
 issue? If you have twenty thousand components that all have to be freed, 
 they all have to be freed whether you do it when the last reference is 
 cleared, or six seconds later when the gc does a sweep.

GC's on large machines these days do not sweep at all.  They copy the
live data to a new heap, then release the old heap.  Because there is
usually more garbage than live data, copying GC's that never touch the
garbage are usually faster than any scheme involving freeing unused
objects one by one.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-29 Thread Steven D'Aprano

On Sun, 29 Aug 2010 17:52:38 -0700, Paul Rubin wrote:

 That's a problem with the CPython API, not reference counting. The
 problem is that the CPython API is written at too low a level, beneath
 that at which the garbage collector exists, so naturally you have to
 manually manage memory.
 
 Care to give an example of a reference counted system that's written any
 other way?

The complexity of the ref counter is invisible when writing pure Python 
code, and I believe it is also invisible when writing code in Cython. The 
difficulty of dealing with ref counts is abstracted away.

The argument that it will work if we always do this means that it won't 
work is a nice quip, but it doesn't stand up. It's possible to defeat 
even Pascal compilers' type checking and do unsafe things by dropping 
down into assembly. So don't do it! It's not hard to not do something, 
although sometimes you might choose to do it anyway, because otherwise 
there is no way to get the code you need/want. But such code should be a 
rare exception.

I'm not saying that ref counting systems can avoid incrementing and 
decrementing the ref counts. That would be silly. But I'm saying that it 
is an accident of implementation that writing C extensions requires you 
to manually manage the counts yourself. That's a side-effect of 
permitting coders to write C extensions in pure C, rather than in some 
intermediate language which handles the ref counts for you but otherwise 
compiles like C. If people cared enough, they could (probably) make the C 
compiler handle it for them, just as it currently handles incrementing 
and decrementing the return stack. 

There's nothing *fundamental* to the idea of ref counting that says that 
you must handle the counts manually -- it depends on how well insulated 
you are from the underlying machine.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-29 Thread Lawrence D'Oliveiro

In message 7x4oeftuk4@ruckus.brouhaha.com, Paul Rubin wrote:

 I'd say [reference-counting is] not real gc because 1) it's unsound
 (misses reference cycles), and 2) it requires constant attention from the
 mutator to incr and decr the reference counts.  So developing modules for
 the CPython API means endlessly finding and fixing refcount bugs that lead
 to either crashes/security failures, or memory leaks.

I don’t see why that should be so. It seems a very simple discipline to 
follow: initialize, allocate, free. Here’s an example snippet from my DVD 
Menu Animator http://github.com/ldo/dvd_menu_animator:

static void GetBufferInfo
  (
PyObject * FromArray,
unsigned long * addr,
unsigned long * len
  )
  /* returns the address and length of the data in a Python array object. */
  {
PyObject * TheBufferInfo = 0;
PyObject * AddrObj = 0;
PyObject * LenObj = 0;
do /*once*/
  {
TheBufferInfo = PyObject_CallMethod(FromArray, buffer_info, );
if (TheBufferInfo == 0)
break;
AddrObj = PyTuple_GetItem(TheBufferInfo, 0);
LenObj = PyTuple_GetItem(TheBufferInfo, 1);
if (PyErr_Occurred())
break;
Py_INCREF(AddrObj);
Py_INCREF(LenObj);
*addr = PyInt_AsUnsignedLongMask(AddrObj);
*len = PyInt_AsUnsignedLongMask(LenObj);
if (PyErr_Occurred())
break;
  }
while (false);
Py_XDECREF(AddrObj);
Py_XDECREF(LenObj);
Py_XDECREF(TheBufferInfo);
  } /*GetBufferInfo*/

It’s quite easy to assure yourself that this is never going to leak memory. 
More complicated examples can simply nest constructs like these one within 
the other to arbitrary depth, while still giving the same assurance at every 
level. In short, this technique scales well.

 If you program the Java JNI or a typical Lisp FFI, you'll find that real
 gc is a lot simpler to use since you avoid all the refcount maintenance
 hassles.  You allocate memory and shut your eyes, and the gc takes care of
 freeing it when it figures out that you are done.

And how do you run such an application? You have to limit it to a 
predetermined amount of memory to begin with, otherwise it would easily 
gobble up everything you have.

In the old days of “classic” MacOS, every application had to run in a fixed-
size application heap. I have no wish to return to those days.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

Lawrence D'Oliveiro l...@geek-central.gen.new_zealand writes:
 the CPython API means endlessly finding and fixing refcount bugs that lead
 to either crashes/security failures, or memory leaks.

 I don’t see why that should be so. It seems a very simple discipline to 
 follow: initialize, allocate, free. Here’s an example snippet from my DVD 
 Menu Animator http://github.com/ldo/dvd_menu_animator:

In practice it has been a problem.  If it hasn't happened to you yet,
you're either burning a bunch of effort that programmers of more
automatic systems can put to more productive uses, or else you just
haven't written enough such code to have gotten bitten yet.

 You allocate memory and shut your eyes, and the gc takes care of
 freeing it when it figures out that you are done.

 And how do you run such an application? You have to limit it to a 
 predetermined amount of memory to begin with, otherwise it would easily 
 gobble up everything you have.

No that's usually not a problem-- the runtime system (generational gc)
can figure out enough from your allocation pattern to prevent the heap
from getting overlarge.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-28 Thread Paul Rubin

Dennis Lee Bieber wlfr...@ix.netcom.com writes:
   The nice thing about it [reference counting] is that it is sort
 of deterministic -- one can examine code and determine that an object
 is collected at some point in the execution...
   Heap marking, OTOH, tends to run at indeterminate times, which could
 have an impact if one needs predictable response timings

Reference counting has the same problem.  If you drop the last reference
to a complex structure, it could take quite a long time to free all the
components.  By contrast there are provably real-time tracing gc
schemes, including some parallelizeable ones.  One reason CPython still
can't run threads on parallel cores is it would have to lock the
reference counts every time they're updated, and the slowdown from that
is terrible.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-28 Thread Steven D'Aprano

On Sat, 28 Aug 2010 00:33:10 -0700, Paul Rubin wrote:

 Dennis Lee Bieber wlfr...@ix.netcom.com writes:
  The nice thing about it [reference counting] is that it is sort
 of deterministic -- one can examine code and determine that an object
 is collected at some point in the execution...
  Heap marking, OTOH, tends to run at indeterminate times, which 
could
 have an impact if one needs predictable response timings
 
 Reference counting has the same problem.  

In theory, yes, but in practice ref counting tends to spread out the 
performance impact more smoothly. There are exceptions, such as the one 
you mention below, but as a general rule ref counting isn't subject to 
the embarrassing pauses that tracing garbage collectors tend to be 
subject to.


 If you drop the last reference
 to a complex structure, it could take quite a long time to free all the
 components.  By contrast there are provably real-time tracing gc
 schemes, including some parallelizeable ones.

I could be wrong, but how can they not be subject to the same performance 
issue? If you have twenty thousand components that all have to be freed, 
they all have to be freed whether you do it when the last reference is 
cleared, or six seconds later when the gc does a sweep.


 One reason CPython still
 can't run threads on parallel cores is it would have to lock the
 reference counts every time they're updated, and the slowdown from that
 is terrible.

On the other hand, the reason that CPython still has reference counting 
is that the alternatives tried so far are unacceptably for non-threaded 
code.




-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-28 Thread Steven D'Aprano

On Fri, 27 Aug 2010 18:06:19 -0700, Paul Rubin wrote:

Steven D'Aprano st...@remove-this-cybersource.com.au writes:
I've repeatedly asked, both here and elsewhere, why reference counting
isn't real garbage collection. Nobody has been able to give me a
satisfactory answer. As far as I can tell, it's a bit of
pretentiousness with no basis in objective fact.

Well, it's a bit of a subjective matter. I'd say it's not real gc
because 1) it's unsound (misses reference cycles),

You can add cycle detection to a reference count gc, at the cost of more
complexity.

If you read the Wikipedia article I linked to, tracing algorithms can
also be unsound:

Some collectors running in a particular environment can
correctly identify all pointers (references) in an object;
these are called precise (also exact or accurate)
collectors, the opposite being a conservative or partly
conservative collector. Conservative collectors have to
assume that any bit pattern in memory could be a pointer if
(when interpreted as a pointer) it would point into any
allocated object. Thus, conservative collectors may have
some false negatives, where storage is not released because
of accidental fake pointers...

http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)

and 2) it requires
constant attention from the mutator to incr and decr the reference
counts.

Yes. And?

So developing modules for the CPython API means endlessly
finding and fixing refcount bugs that lead to either crashes/security
failures, or memory leaks. If you program the Java JNI or a typical
Lisp FFI, you'll find that real gc is a lot simpler to use since you
avoid all the refcount maintenance hassles. You allocate memory and
shut your eyes, and the gc takes care of freeing it when it figures out
that you are done. Refcounting is basically a form of manual memory
management, while gc is automatic.

That's a problem with the CPython API, not reference counting. The
problem is that the CPython API is written at too low a level, beneath
that at which the garbage collector exists, so naturally you have to
manually manage memory.

Someone said here recently that as a program gets larger, saying this
will work as long as we do X every time without fail becomes equal to
saying this won't work. Substitute properly maintain all ref counts
for X and you can see the problem. I've seen released production
tested Python C modules with subtle refcount bugs on more than one
occasion. In gc'd systems there are fewer places for the code to go
wrong.

On the other hand, tracing gcs have their own set of problems too, mostly
due to the use of finalizers and attempts to make garbage collection run
more predictably. See here:

http://publib.boulder.ibm.com/infocenter/javasdk/v1r4m2/topic/com.ibm.java.doc.diagnostics.142j9/html/coexistwithgc.html

Quote:

For tidying Java resources, think about the use of a clean
up routine. When you have finished with an object, call the
routine to null out all references, deregister listeners,
clear out hash tables, and so on. This is far more efficient
than using a finalizer and has the useful side-benefit of
speeding up garbage collection. The Garbage Collector does
not have so many object references to chase in the next
garbage collection cycle.

Translated: Rather than relying on the garbage collector to clean up
resources after you, do it yourself, manually, so the garbage collector
has less work to do.

Tracing garbage collectors aren't a panacea. They're software themselves,
and complex software, which means they're subject to bugs like the one
which plagued Flash plugin 9:

http://gskinner.com/blog/archives/2008/04/failure_to_unlo.html

The more complicated the garbage collector, the more scope you have for
some interaction between your high-level code and the gc leading to
memory not be reclaimed or extreme slowdown. Like this:

http://tech.puredanger.com/2009/02/11/linkedblockingqueue-garbagecollection/

--
Steven
--
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-28 Thread Aahz

In article 4c78572c$0$28655$c3e8...@news.astraweb.com,
Steven D'Aprano  st...@remove-this-cybersource.com.au wrote:
On Fri, 27 Aug 2010 09:16:52 -0700, Aahz wrote:
 In article mailman.1967.1281549328.1673.python-l...@python.org, MRAB 
 pyt...@mrabarnett.plus.com wrote:

An object will be available for garbage collection when nothing refers
to it either directly or indirectly. If it's unreferenced then it will
go away.
 
 This isn't actually garbage collection as most people think of it.
 Refcounting semantics mean that objects get reaped as soon as nothing
 points at them.  OTOH, CPython does also have garbage collection to back
 up refcounting so that when you have unreferenced object cycles they
 don't stay around.

I've repeatedly asked, both here and elsewhere, why reference counting 
isn't real garbage collection. Nobody has been able to give me a 
satisfactory answer. As far as I can tell, it's a bit of pretentiousness 
with no basis in objective fact.

You'll notice that I was very careful to qualify my statement with as
most people think of it.  Also, because CPython has two different memory
management mechanisms, refcounting and cycle detection, and the module
that controls cycle detection is called gc, I think it's simpler to
follow along with the Python docs -- and critically important to remind
people that there are in fact two different systems.
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

...if I were on life-support, I'd rather have it run by a Gameboy than a
Windows box.  --Cliff Wells
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-28 Thread Aahz

In article 4c78e7a5$0$28655$c3e8...@news.astraweb.com,
Steven D'Aprano  st...@remove-this-cybersource.com.au wrote:

On the other hand, the reason that CPython still has reference counting 
is that the alternatives tried so far are unacceptably for non-threaded 
code.

No, it's *a* reason, the other main reason being that refcounting is much
easier for a random C library.
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

...if I were on life-support, I'd rather have it run by a Gameboy than a
Windows box.  --Cliff Wells
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-27 Thread Aahz

In article mailman.1967.1281549328.1673.python-l...@python.org,
MRAB  pyt...@mrabarnett.plus.com wrote:

An object will be available for garbage collection when nothing refers
to it either directly or indirectly. If it's unreferenced then it will
go away.

This isn't actually garbage collection as most people think of it.
Refcounting semantics mean that objects get reaped as soon as nothing
points at them.  OTOH, CPython does also have garbage collection to back
up refcounting so that when you have unreferenced object cycles they
don't stay around.
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

...if I were on life-support, I'd rather have it run by a Gameboy than a
Windows box.  --Cliff Wells
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-27 Thread John Nagle


On 8/11/2010 1:26 PM, EW wrote:

On Aug 11, 2:52 pm, Paul Rubinno.em...@nospam.invalid  wrote:

EWericwoodwo...@gmail.com  writes:

Well I cared because I thought garbage collection would only happen
when the script ended - the entire script.  Since I plan on running
this as a service it'll run for months at a time without ending.  So I
thought I was going to have heaps of Queues hanging out in memory,
unreferenced and unloved.  It seemed like bad practice so I wanted to
get out ahead of it.


Even if GC worked that way it wouldn't matter, if you use just one queue
per type of task.  That number should be a small constant so the memory
consumption is small.


Well I can't really explain it but 1 Queue per task for what I'm
designing just doesn't feel right to me.  It feels like it will lack
future flexibility.  I like having 1 Queue per producer thread object
and the person instantiating that object can do whatever he wants with
that Queue.  I can't prove I'll need that level of flexibility but I
don't see why it' bad to have.  It's still a small number of Queues,
it's just a small, variable, number of Queues.


That's backwards.  Usually, you want one queue per unique consumer.
That is, if you have a queue that contains one kind of request,
there's one thread reading the queue, blocked until some other
thread puts something on the queue.  No polling is needed.

One consumer reading multiple queues is difficult to implement
well.

Note, by the way, that CPython isn't really concurrent.  Only
one thread runs at a time, due to an archaic implementation.  So
if your threads are compute-bound, even on a multicore CPU threading
will not help.

There's a multiprocessing module which allows spreading work
over several processes instead of threads.  That can be helpful
as a workaround.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-27 Thread Steven D'Aprano

On Fri, 27 Aug 2010 09:16:52 -0700, Aahz wrote:

 In article mailman.1967.1281549328.1673.python-l...@python.org, MRAB 
 pyt...@mrabarnett.plus.com wrote:

An object will be available for garbage collection when nothing refers
to it either directly or indirectly. If it's unreferenced then it will
go away.
 
 This isn't actually garbage collection as most people think of it.
 Refcounting semantics mean that objects get reaped as soon as nothing
 points at them.  OTOH, CPython does also have garbage collection to back
 up refcounting so that when you have unreferenced object cycles they
 don't stay around.


I've repeatedly asked, both here and elsewhere, why reference counting 
isn't real garbage collection. Nobody has been able to give me a 
satisfactory answer. As far as I can tell, it's a bit of pretentiousness 
with no basis in objective fact.

http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
http://en.wikipedia.org/wiki/Reference_counting

Reference counting is one specific kind of garbage collection, and like 
all gc strategies, it has strengths as well as weaknesses. It is *simple* 
to implement (which may be why a certain class of programmer likes to 
think it isn't real gc). When it runs is deterministic, and is 
immediate upon the resource being freed. The overhead is very light (a 
plus) and continuous (which can be both a plus and a minus). It is better 
suited to managing scarce resources like open files than are tracing 
garbage collectors. It avoids the embarrassing pause of tracing 
collectors. It doesn't deal well with reference cycles, and (at least 
with Python's implementation of ref counting) it causes performance 
issues with threaded applications.

http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
http://en.wikipedia.org/wiki/Reference_counting

So CPython has two garbage collectors -- a reference counting 
implementation, and a tracing implementation. Jython and IronPython use 
the native garbage collectors from Java and .Net. Other Pythons may use 
something else.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-27 Thread Paul Rubin

Steven D'Aprano st...@remove-this-cybersource.com.au writes:
 I've repeatedly asked, both here and elsewhere, why reference counting 
 isn't real garbage collection. Nobody has been able to give me a 
 satisfactory answer. As far as I can tell, it's a bit of pretentiousness 
 with no basis in objective fact.

Well, it's a bit of a subjective matter.  I'd say it's not real gc
because 1) it's unsound (misses reference cycles), and 2) it requires
constant attention from the mutator to incr and decr the reference
counts.  So developing modules for the CPython API means endlessly
finding and fixing refcount bugs that lead to either crashes/security
failures, or memory leaks.  If you program the Java JNI or a typical
Lisp FFI, you'll find that real gc is a lot simpler to use since you
avoid all the refcount maintenance hassles.  You allocate memory and
shut your eyes, and the gc takes care of freeing it when it figures out
that you are done.  Refcounting is basically a form of manual memory
management, while gc is automatic.  

Someone said here recently that as a program gets larger, saying this
will work as long as we do X every time without fail becomes equal to
saying this won't work.  Substitute properly maintain all ref counts
for X and you can see the problem.  I've seen released production
tested Python C modules with subtle refcount bugs on more than one
occasion.  In gc'd systems there are fewer places for the code to go
wrong.
-- 
http://mail.python.org/mailman/listinfo/python-list

Queue cleanup

Hi

I'm writing a multithreaded app that relies on Queues to move data
between the threads.  I'm trying to write my objects in a general way
so that I can reuse them in the future so I need to write them in such
a way that I don't know how many producer and how many consumer
threads I might need.  I also might have different consumer threads do
different tasks (for example one might write to a log and one might
write to SQL) so that again means I can't plan for a set ratio of
consumers to producers.  So it's unknown.

So this means that instead of having 1 Queue that all the producers
put to and that all the consumers get from I actually have 1 Queue per
producer thread  that the main body sends to the correct type of
consumer thread.  So I could get something like this where 3 producer
threads write to 3 different Queues all of which get read by 1
consumer thread:

P1P2   P3
 \|   /
   \  |  /
C1

So producers 1, 2, and 3 all write to individual Queues and consumer 1
had a list of those Queues and reads them all.  The problem I'm having
is that those producer threads can come and go pretty quickly and when
they die I can cleanup the thread with join() but I'm still left with
the Queue.  So I could get something like this:

P1 P3
 \|   /
   \  |  /
C1

So here the P2 thread has ended and gone away but I still have his
Queue lingering.

So on a thread I can use is_alive() to check status and use join() to
clean up but I don't see any analogous functionality for Queues.  How
do I kill them?  I thought about putting a suicide message on the
Queue and then C1 would read it and set the variable to None but i'm
not sure setting the variable to None actually makes the Queue go
away.  It could just end up sitting in memory unreferenced - and
that's not good.  Additionally, I could have any number of consumer
threads reading that Queue so once the first one get the suicide note
the other consumer threads never would.

I figure there has to be an elegant way for managing my Queues but so
far I can't find it.  Any suggestions would be appreciated and thanks
in advance for any help.


ps Python rocks.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

On Aug 11, 12:55 pm, EW ericwoodwo...@gmail.com wrote:
 Hi

 I'm writing a multithreaded app that relies on Queues to move data
 between the threads.  I'm trying to write my objects in a general way
 so that I can reuse them in the future so I need to write them in such
 a way that I don't know how many producer and how many consumer
 threads I might need.  I also might have different consumer threads do
 different tasks (for example one might write to a log and one might
 write to SQL) so that again means I can't plan for a set ratio of
 consumers to producers.  So it's unknown.

 So this means that instead of having 1 Queue that all the producers
 put to and that all the consumers get from I actually have 1 Queue per
 producer thread  that the main body sends to the correct type of
 consumer thread.  So I could get something like this where 3 producer
 threads write to 3 different Queues all of which get read by 1
 consumer thread:

 P1    P2   P3
      \    |   /
        \  |  /
         C1

 So producers 1, 2, and 3 all write to individual Queues and consumer 1
 had a list of those Queues and reads them all.  The problem I'm having
 is that those producer threads can come and go pretty quickly and when
 they die I can cleanup the thread with join() but I'm still left with
 the Queue.  So I could get something like this:

 P1         P3
      \    |   /
        \  |  /
         C1

 So here the P2 thread has ended and gone away but I still have his
 Queue lingering.

 So on a thread I can use is_alive() to check status and use join() to
 clean up but I don't see any analogous functionality for Queues.  How
 do I kill them?  I thought about putting a suicide message on the
 Queue and then C1 would read it and set the variable to None but i'm
 not sure setting the variable to None actually makes the Queue go
 away.  It could just end up sitting in memory unreferenced - and
 that's not good.  Additionally, I could have any number of consumer
 threads reading that Queue so once the first one get the suicide note
 the other consumer threads never would.

 I figure there has to be an elegant way for managing my Queues but so
 far I can't find it.  Any suggestions would be appreciated and thanks
 in advance for any help.

 ps Python rocks.

Whoo..the formatting got torn up!  My terrible diagrams are even more
terrible!  Oh well, I think you'll catch my meaning   :)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

EW ericwoodwo...@gmail.com writes:
 I also might have different consumer threads do
 different tasks (for example one might write to a log and one might
 write to SQL) so that again means I can't plan for a set ratio of
 consumers to producers  So it's unknown.

 So this means that instead of having 1 Queue that all the producers
 put to and that all the consumers get from I actually have 1 Queue per
 producer thread 

That doesn't sound appropriate.  Queues can have many readers and many
writers.  So use one queue per task (logging, SQL, etc), regardless of
the number of producer or consumer threads.  Any producer with an SQL
request sends it to the SQL queue, which can have many listeners.  The
different SQL consumer threads listen to the SQL queue and pick up
requests and handle them.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

On Aug 11, 1:18 pm, Paul Rubin no.em...@nospam.invalid wrote:
 EW ericwoodwo...@gmail.com writes:
  I also might have different consumer threads do
  different tasks (for example one might write to a log and one might
  write to SQL) so that again means I can't plan for a set ratio of
  consumers to producers  So it's unknown.

  So this means that instead of having 1 Queue that all the producers
  put to and that all the consumers get from I actually have 1 Queue per
  producer thread

 That doesn't sound appropriate.  Queues can have many readers and many
 writers.  So use one queue per task (logging, SQL, etc), regardless of
 the number of producer or consumer threads.  Any producer with an SQL
 request sends it to the SQL queue, which can have many listeners.  The
 different SQL consumer threads listen to the SQL queue and pick up
 requests and handle them.

I thought about doing it that way and I could do it that way but it
still seems like there should be a way to clean up Queues on my own.
If I did it this way then I guess I'd be relying on garbage collection
when the script ended to clean up the Queues for me.

What if I want to clean up my own Queues?  Regardless of the specifics
of my current design, I'm just generally curious how people manage
cleanup of their Queues when they don't want them any more.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-11 Thread MRAB


EW wrote:
[snip]

So here the P2 thread has ended and gone away but I still have his
Queue lingering.

So on a thread I can use is_alive() to check status and use join() to
clean up but I don't see any analogous functionality for Queues.  How
do I kill them?  I thought about putting a suicide message on the
Queue and then C1 would read it and set the variable to None but i'm
not sure setting the variable to None actually makes the Queue go
away.  It could just end up sitting in memory unreferenced - and
that's not good.  Additionally, I could have any number of consumer
threads reading that Queue so once the first one get the suicide note
the other consumer threads never would.

I figure there has to be an elegant way for managing my Queues but so
far I can't find it.  Any suggestions would be appreciated and thanks
in advance for any help.


An object will be available for garbage collection when nothing refers
to it either directly or indirectly. If it's unreferenced then it will
go away.

As for the suicide note, if a consumer sees it then it can put it back
into the queue so other consumers will see it and then forget about the
queue (set the variable which refers to the queue to None, or, if the
references are in a list, delete it from the list).
--
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

On Aug 11, 1:55 pm, MRAB pyt...@mrabarnett.plus.com wrote:
 EW wrote:

 [snip]



  So here the P2 thread has ended and gone away but I still have his
  Queue lingering.

  So on a thread I can use is_alive() to check status and use join() to
  clean up but I don't see any analogous functionality for Queues.  How
  do I kill them?  I thought about putting a suicide message on the
  Queue and then C1 would read it and set the variable to None but i'm
  not sure setting the variable to None actually makes the Queue go
  away.  It could just end up sitting in memory unreferenced - and
  that's not good.  Additionally, I could have any number of consumer
  threads reading that Queue so once the first one get the suicide note
  the other consumer threads never would.

  I figure there has to be an elegant way for managing my Queues but so
  far I can't find it.  Any suggestions would be appreciated and thanks
  in advance for any help.

 An object will be available for garbage collection when nothing refers
 to it either directly or indirectly. If it's unreferenced then it will
 go away.

 As for the suicide note, if a consumer sees it then it can put it back
 into the queue so other consumers will see it and then forget about the
 queue (set the variable which refers to the queue to None, or, if the
 references are in a list, delete it from the list).

Ok great.  I wasn't sure about the Garbage collection part of it.
That's actually pretty easy.

Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

EW ericwoodwo...@gmail.com writes:
 I thought about doing it that way and I could do it that way but it
 still seems like there should be a way to clean up Queues on my own.
 If I did it this way then I guess I'd be relying on garbage collection
 when the script ended to clean up the Queues for me.

Oh, I see.  As long as it's possible to start new producer or consumer
threads that touch a queue, obviously that queue has to still be around.
If the program starts all its threads at the beginning, then runs til
they exit, then does more stuff, then you could do something like:

# make dictonary of queues, one queue per task type
queues = {'sql': Queue(), 'logging': Queue(), ... }

for i in whatever threads you want
   threading.Thread(target=your_handler, args=[queues])

del queues

and then when all the threads exit, there are no remaining references to
the queues.  But why do you care?  Queues aren't gigantic structures,
they're just a list (collections.deque) with an rlock.  It's fine to let
the gc clean them up; that's the whole point of having a gc in the first
place.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

On Aug 11, 2:16 pm, Paul Rubin no.em...@nospam.invalid wrote:
 EW ericwoodwo...@gmail.com writes:
  I thought about doing it that way and I could do it that way but it
  still seems like there should be a way to clean up Queues on my own.
  If I did it this way then I guess I'd be relying on garbage collection
  when the script ended to clean up the Queues for me.

 Oh, I see.  As long as it's possible to start new producer or consumer
 threads that touch a queue, obviously that queue has to still be around.
 If the program starts all its threads at the beginning, then runs til
 they exit, then does more stuff, then you could do something like:

     # make dictonary of queues, one queue per task type
     queues = {'sql': Queue(), 'logging': Queue(), ... }

     for i in whatever threads you want
        threading.Thread(target=your_handler, args=[queues])

     del queues

 and then when all the threads exit, there are no remaining references to
 the queues.  But why do you care?  Queues aren't gigantic structures,
 they're just a list (collections.deque) with an rlock.  It's fine to let
 the gc clean them up; that's the whole point of having a gc in the first
 place.

Well I cared because I thought garbage collection would only happen
when the script ended - the entire script.  Since I plan on running
this as a service it'll run for months at a time without ending.  So I
thought I was going to have heaps of Queues hanging out in memory,
unreferenced and unloved.  It seemed like bad practice so I wanted to
get out ahead of it.

But the GC doesn't work the way I thought it worked so there's really
no problem I guess. I was just confused on garbage collection it seems.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

EW ericwoodwo...@gmail.com writes:
 Well I cared because I thought garbage collection would only happen
 when the script ended - the entire script.  Since I plan on running
 this as a service it'll run for months at a time without ending.  So I
 thought I was going to have heaps of Queues hanging out in memory,
 unreferenced and unloved.  It seemed like bad practice so I wanted to
 get out ahead of it.

Even if GC worked that way it wouldn't matter, if you use just one queue
per type of task.  That number should be a small constant so the memory
consumption is small.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

2010-08-11 Thread MRAB


Paul Rubin wrote:

EW ericwoodwo...@gmail.com writes:

Well I cared because I thought garbage collection would only happen
when the script ended - the entire script.  Since I plan on running
this as a service it'll run for months at a time without ending.  So I
thought I was going to have heaps of Queues hanging out in memory,
unreferenced and unloved.  It seemed like bad practice so I wanted to
get out ahead of it.


Even if GC worked that way it wouldn't matter, if you use just one queue
per type of task.  That number should be a small constant so the memory
consumption is small.


That's basically how _non_-garbage-collected languages work! :-)
--
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup

On Aug 11, 2:52 pm, Paul Rubin no.em...@nospam.invalid wrote:
 EW ericwoodwo...@gmail.com writes:
  Well I cared because I thought garbage collection would only happen
  when the script ended - the entire script.  Since I plan on running
  this as a service it'll run for months at a time without ending.  So I
  thought I was going to have heaps of Queues hanging out in memory,
  unreferenced and unloved.  It seemed like bad practice so I wanted to
  get out ahead of it.

 Even if GC worked that way it wouldn't matter, if you use just one queue
 per type of task.  That number should be a small constant so the memory
 consumption is small.

Well I can't really explain it but 1 Queue per task for what I'm
designing just doesn't feel right to me.  It feels like it will lack
future flexibility.  I like having 1 Queue per producer thread object
and the person instantiating that object can do whatever he wants with
that Queue.  I can't prove I'll need that level of flexibility but I
don't see why it' bad to have.  It's still a small number of Queues,
it's just a small, variable, number of Queues.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Queue cleanup