Re: [Python-Dev] Linus on garbage collection

2011-05-07 Thread Xavier Morel
On 2011-05-07, at 03:39 , Glyph Lefkowitz wrote:
> 
> I don't know if there's a programming language and runtime with a real-time, 
> VM-cooperating garbage collector that actually exists today which has all the 
> bells and whistles required to implement an OS kernel, so I wouldn't give the 
> Linux kernel folks too much of a hard time for still using C; but there's 
> nothing wrong with the idea in the abstract.
Not sure it had all those bells and whistles, and there were other issues, but 
I believe Lisp Machines implemented garbage collection at the hardware (or at 
least microcode) level, and the OS itself provided a pretty direct interface to 
it (it was part of the core services).

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-07 Thread Antoine Pitrou
On Fri, 6 May 2011 21:39:10 -0400
Glyph Lefkowitz  wrote:
> 
> The assertion that "modern hardware" is not designed for big data-structure 
> pointer-chasing is also a bit silly.  On the contrary, modern hardware has 
> evolved staggeringly massive caches, specifically because large programs 
> (whether they're GC'd or not) tend to do lots of this kind of thing, because 
> there's a certain level of complexity beyond which one can no longer avoid it.

"Staggeringly massive"?
The average 4MB L3 cache is very small compared to the heap of
non-trivial Python (or Java) workloads.

And Linus is right: modern hardware is not optimized for random
pointer-chasing, simply because optimizing for it is very hard.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-07 Thread Greg Ewing

Stefan Behnel wrote:


It's a dead-end that is referenced by a cycle, that's all.


But shouldn't it be breaking the cycle by clearing one
of the objects that's actually part of the cycle, rather
than part of the dead-end?

I can't see how the Document could get picked for clearing
unless it was actually in the cycle. Either that or I'm
imagining the cyclic GC algorithm to be smarter than it
actually is.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-07 Thread Stefan Behnel

Greg Ewing, 07.05.2011 02:26:

Stefan Behnel wrote:

After all, the described crash case indicates that the Document
destructor was called before all of the Element destructors had been
called, although all Elements reference their Document, but the Document
does not refer to any of the Elements,


In that case, why was the GC system regarding this as a cycle
at all? There must be more going on.


It's a dead-end that is referenced by a cycle, that's all.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Georg Brandl
On 07.05.2011 01:25, Greg Ewing wrote:
> Neal Becker wrote:
>> http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html
> 
> There, Linus says
> 
>> For example, if you have an _explicit_ refcounting system, then it is
>> quite natural to have operations like ...
>> 
>>  note_t *node = *np;
>>  if (node->count > 1)
>>  newnode = copy_alloc(node);
> 
> It's interesting to note that, even though you *can* get reference
> count information in CPython, it's not all that useful for doing
> things like that, because it's hard to be sure how many incidental
> references have been created on the way to the code concerned.
> So tricks like this at the Python level aren't really feasible in
> any robust way.

But they are at the C level, see for example the optimization for

  string += something

if "string"'s reference count is exactly one.

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Glyph Lefkowitz
Apologies in advance for contributing to an obviously and increasingly 
off-topic thread, but this kind of FUD about GC is a pet peeve of mine.

On May 6, 2011, at 10:04 AM, Neal Becker wrote:

> http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html

Counterpoint: .  Sorry Linus, sometimes 
correctness matters more than performance.

But, even the performance argument is kind of bogus.  See, for example, this 
paper on real-time garbage collection: 
.
  That's just one example of an easy-to-find solution to a problem that Linus 
holds up as unsolved or unsolvable.  There are solutions to pretty much all of 
the problems that Linus brings up.  One of these solutions is even famously 
implemented by CPython!  The CPython "string +=" idiom optimization fixes at 
least one case of the "you tend to always copy the node" antipattern Linus 
describes, and lots of languages (especially Scheme and derivatives, IIRC) have 
very nice optimizations around this area.  One could argue that any functional 
language without large pools of mutable state (i.e. Erlang) is a massive 
optimization for this case.

Another example: the "dirty cache" problem Linus talks about can be addressed 
by having a GC that cooperates with the VMM: 
.

And the "re-using stuff as fast as possible" thing is exactly the kind of 
problem that generational GCs address.  When you run out of space in cache, you 
reap your first generation before you start copying stuff.  One of the key 
insights of generational GC is that you'll usually reclaim enough (in this 
case, cache-local) memory that you can keep going for a little while.  You 
don't have to read a super fancy modern paper on this, Wikipedia explains 
nicely: 
.
  Of course if you don't tune your GC at all for your machine-specific cache 
size, you won't see this performance benefit play out.

I don't know if there's a programming language and runtime with a real-time, 
VM-cooperating garbage collector that actually exists today which has all the 
bells and whistles required to implement an OS kernel, so I wouldn't give the 
Linux kernel folks too much of a hard time for still using C; but there's 
nothing wrong with the idea in the abstract.  The performance differences 
between automatic and manual GC are dubious at best, and with a really good GC 
and a language that supports it, GC tends to win big.  When it loses, it loses 
in ways which can be fixed in one area of the code (the GC) rather than 
millions of tiny fixes across your whole codebase, as is the case with 
strategies used by manual collection algorithms.

The assertion that "modern hardware" is not designed for big data-structure 
pointer-chasing is also a bit silly.  On the contrary, modern hardware has 
evolved staggeringly massive caches, specifically because large programs 
(whether they're GC'd or not) tend to do lots of this kind of thing, because 
there's a certain level of complexity beyond which one can no longer avoid it.  
It's old hardware, with tiny caches (that were, by virtue of their tininess, 
closer to the main instruction-processing silicon), that was optimized for the 
"carefully stack-allocating everything in the world to conserve cache" approach.

You can see this pretty clearly by running your favorite Python benchmark of 
choice on machines which are similar except for cache size.  The newer machine, 
with the bigger cache, will run Python considerably faster, but doesn't help 
the average trivial C benchmark that much - or, for that matter, Linux 
benchmarks.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Greg Ewing

Stefan Behnel wrote:
After all, the described crash case indicates that 
the Document destructor was called before all of the Element destructors 
had been called, although all Elements reference their Document, but the 
Document does not refer to any of the Elements,


In that case, why was the GC system regarding this as a cycle
at all? There must be more going on.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Greg Ewing

Mark Shannon wrote:


With a tracing GC:
While the Elements are finalized, the Document is still alive.
While the Document is finalized, the Elements are still alive.
Then, and only then, is the whole lot reclaimed.


One problem is that, at the C level in CPython, you can't separate
finalisation and reclamation. When an object's refcount drops to
zero, its tp_dealloc method is called, which both finalises the object
and reclaims its memory.

Another problem is that just because an object's memory hasn't
been reclaimed yet doesn't mean it's safe to do anything with that
object. This is doubly true at the C level, where the consequences
can include segfaults.

Seems to me the basic issue here is that the C code wasn't designed
with tracing GC in mind. There is a reference cycle, but it is
assumed that the user is in manual control of deallocation and will
deallocate the Nodes before the Document.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Nick Coghlan
Even if he's right (and he probably is) manual memory management is still a 
premature optimization for most applications. C and C++ data structures are a 
PITA because you have to be so careful to avoid leaks and double-frees, so 
people end up using dumb algorithms. Worrying about losing cycles waiting for 
main memory is stupid if your high level algorithm is O(N^2).

Cheers,
Nick.

--
Nick Coghlan, Brisbane, Australia

On 07/05/2011, at 12:04 AM, Neal Becker  wrote:

> http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html
> 
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Greg Ewing

Mark Shannon wrote:


For example, a file object will close itself during finalization,
but its still a valid object, just a closed file rather than an open one.


It might be valid in the sense that you won't get a segfault.
But the point is that the destructors of some objects may be
relying on other objects still being in a certain state,
e.g. a file still being open.

One would have to adopt a highly defensive coding style in
destructors, verging on paranoia, to be sure that one's destructor
code was completely immune to this kind of problem.

All of this worry goes away if the destructor is not a method
of the object being destroyed, but something external that
runs *after* the object has disappeared.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Greg Ewing

Antoine> http://www.selenic.com/pipermail/mercurial-devel/2011-May/031055.html



From that note:


1: You can't have meaningful destructors, because when destruction
happens is undefined. And going-out-of-scope destructors are extremely
useful. Python is already a rather broken in this regard, so feel free
to ignore this point.


It's only broken if you regard RAII as the One True Way to
implement scoped resource management. Python has other approaches
to that, such as the with-statement.

Also, you *can* have destructors that work for objects in cycles,
as long as you don't insist on the destructor having access to
the object that's being destroyed. Weakref callbacks provide a
way of implementing this in CPython.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Greg Ewing

Neal Becker wrote:

http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html


There, Linus says


For example, if you have an _explicit_ refcounting system, then it is
quite natural to have operations like ...

note_t *node = *np;
if (node->count > 1)
newnode = copy_alloc(node);


It's interesting to note that, even though you *can* get reference
count information in CPython, it's not all that useful for doing
things like that, because it's hard to be sure how many incidental
references have been created on the way to the code concerned.
So tricks like this at the Python level aren't really feasible in
any robust way.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Dan Stromberg
On Fri, May 6, 2011 at 7:04 AM, Neal Becker  wrote:

> http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html
>

Of course, a generational GC improves locality of reference.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Stefan Behnel

Mark Shannon, 06.05.2011 20:45:

Stefan Behnel wrote:

Michael Foord, 06.05.2011 19:06:

On 06/05/2011 17:51, Stefan Behnel wrote:

Mark Shannon, 06.05.2011 18:33:

s...@pobox.com wrote:

Antoine> Since we're sharing links, here's Matt Mackall's take:
Antoine>
http://www.selenic.com/pipermail/mercurial-devel/2011-May/031055.html


From that note:

1: You can't have meaningful destructors, because when destruction
happens is undefined. And going-out-of-scope destructors are extremely
useful. Python is already a rather broken in this regard, so feel free
to ignore this point.

Given the presence of cyclic data I don't see how reference counting or
garbage collection win. Ignoring the fact that in a pure reference
counted
system you won't even consider cycles for reclmation, would both RC
and GC
have to punt because they can't tell which object's destructor to call
first?

It doesn't matter which is called first.

May I quote you on that one the next time my software crashes?

Arbitrarily breaking cycles *could* cause a problem if a destructor
attempts to access an already collected object.


This is more real than the "could" suggests. Remember that CPython
includes a lot of C code, and is commonly used to interface with C
libraries. While you will simply get an exception when cycles are broken
in Python code, cycles that involve C code can suffer quite badly from
this problem.

There was a bug in the lxml.etree XML library a while ago that could let
it crash hard when its Element objects participated in a reference cycle.
It's based on libxml2, so there's an underlying C tree that potentially
involves disconnected subtrees, and a Python space representation using
Element proxies, with at least one Element for each disconnected subtree.

Basically, Elements reference their Document (not the other way round)
even if they are disconnected from the main C document tree. The Document
needs to do some final cleanup in the end, whereas the Elements require
the Document to be alive to do their own subtree cleanup, if only to know
what exactly to clean up, as the subtrees share some C state through the
document. Now, if any of the Elements ends up in a reference cycle for
some reason, the GC will throw its dices and may decide to call the
Document destructor first. Then the Element destructors are bound to
crash, trying to access dead memory of the Document.


With a tracing collector it is *impossible* to access dead memory, ever.
If it can be reached the GC will *not* collect it.
This should be a fundamental invariant of *all* GCs.

If an object is finalizable or reachable from any finalizable objects
then it is reachable and its memory should not be reclaimed until it is
truly unreachable.

Finalization and reclamation are separate phases.


Sure. However, I'm talking about Python types and C memory here. Even if 
the Python objects are still alive, they may already have freed the 
underlying C memory during their *finalisation*. When an Element goes out 
of scope, it must free its C subtree if it is disconnected, even if the 
Document stays alive. So that's what Elements do in their destructor, and 
they need the Document's C memory for that, which the Document frees during 
its own finalisation.


I do agree that CPython's destructor call algorithms could have been 
smarter in this case. After all, the described crash case indicates that 
the Document destructor was called before all of the Element destructors 
had been called, although all Elements reference their Document, but the 
Document does not refer to any of the Elements, so it's basically a dead 
end. That would have provided a detectable hint to call the Document 
destructor last, after the ones of all objects that reference it. 
Apparently, this hint did not lead to an appropriate action, possibly 
because it's an unimplemented special case and there are enough cases where 
multiple objects with destructors are actually part of the 'real' cycle.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Mark Shannon

Stefan Behnel wrote:

Michael Foord, 06.05.2011 19:06:

On 06/05/2011 17:51, Stefan Behnel wrote:

Mark Shannon, 06.05.2011 18:33:

s...@pobox.com wrote:

Antoine> Since we're sharing links, here's Matt Mackall's take:
Antoine>
http://www.selenic.com/pipermail/mercurial-devel/2011-May/031055.html


From that note:

1: You can't have meaningful destructors, because when destruction
happens is undefined. And going-out-of-scope destructors are extremely
useful. Python is already a rather broken in this regard, so feel free
to ignore this point.

Given the presence of cyclic data I don't see how reference counting or
garbage collection win. Ignoring the fact that in a pure reference counted
system you won't even consider cycles for reclmation, would both RC and GC
have to punt because they can't tell which object's destructor to call
first?

It doesn't matter which is called first.

May I quote you on that one the next time my software crashes?

Arbitrarily breaking cycles *could* cause a problem if a destructor
attempts to access an already collected object.


This is more real than the "could" suggests. Remember that CPython includes 
a lot of C code, and is commonly used to interface with C libraries. While 
you will simply get an exception when cycles are broken in Python code, 
cycles that involve C code can suffer quite badly from this problem.


There was a bug in the lxml.etree XML library a while ago that could let it 
crash hard when its Element objects participated in a reference cycle. It's 
based on libxml2, so there's an underlying C tree that potentially involves 
disconnected subtrees, and a Python space representation using Element 
proxies, with at least one Element for each disconnected subtree.


Basically, Elements reference their Document (not the other way round) even 
if they are disconnected from the main C document tree. The Document needs 
to do some final cleanup in the end, whereas the Elements require the 
Document to be alive to do their own subtree cleanup, if only to know what 
exactly to clean up, as the subtrees share some C state through the 
document. Now, if any of the Elements ends up in a reference cycle for some 
reason, the GC will throw its dices and may decide to call the Document 
destructor first. Then the Element destructors are bound to crash, trying 
to access dead memory of the Document.


With a tracing collector it is *impossible* to access dead memory, ever.
If it can be reached the GC will *not* collect it.
This should be a fundamental invariant of *all* GCs.

If an object is finalizable or reachable from any finalizable objects
then it is reachable and its memory should not be reclaimed until it is 
truly unreachable.


Finalization and reclamation are separate phases.



This was easy to fix in CPython's refcounting environment. A double INCREF 
on the Document for each Element does the trick, as it effectively removes 
the Document from the collectable cycle and lets the Element destructors 
decide when to let the Document refcount go down to 0. A fix in a pure GC 
system is substantially harder to make efficient.


With a tracing GC:
While the Elements are finalized, the Document is still alive.
While the Document is finalized, the Elements are still alive.
Then, and only then, is the whole lot reclaimed.

Mark.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Stefan Behnel

Michael Foord, 06.05.2011 19:06:

On 06/05/2011 17:51, Stefan Behnel wrote:

Mark Shannon, 06.05.2011 18:33:

s...@pobox.com wrote:

Antoine> Since we're sharing links, here's Matt Mackall's take:
Antoine>
http://www.selenic.com/pipermail/mercurial-devel/2011-May/031055.html


From that note:


1: You can't have meaningful destructors, because when destruction
happens is undefined. And going-out-of-scope destructors are extremely
useful. Python is already a rather broken in this regard, so feel free
to ignore this point.

Given the presence of cyclic data I don't see how reference counting or
garbage collection win. Ignoring the fact that in a pure reference counted
system you won't even consider cycles for reclmation, would both RC and GC
have to punt because they can't tell which object's destructor to call
first?


It doesn't matter which is called first.


May I quote you on that one the next time my software crashes?


Arbitrarily breaking cycles *could* cause a problem if a destructor
attempts to access an already collected object.


This is more real than the "could" suggests. Remember that CPython includes 
a lot of C code, and is commonly used to interface with C libraries. While 
you will simply get an exception when cycles are broken in Python code, 
cycles that involve C code can suffer quite badly from this problem.


There was a bug in the lxml.etree XML library a while ago that could let it 
crash hard when its Element objects participated in a reference cycle. It's 
based on libxml2, so there's an underlying C tree that potentially involves 
disconnected subtrees, and a Python space representation using Element 
proxies, with at least one Element for each disconnected subtree.


Basically, Elements reference their Document (not the other way round) even 
if they are disconnected from the main C document tree. The Document needs 
to do some final cleanup in the end, whereas the Elements require the 
Document to be alive to do their own subtree cleanup, if only to know what 
exactly to clean up, as the subtrees share some C state through the 
document. Now, if any of the Elements ends up in a reference cycle for some 
reason, the GC will throw its dices and may decide to call the Document 
destructor first. Then the Element destructors are bound to crash, trying 
to access dead memory of the Document.


This was easy to fix in CPython's refcounting environment. A double INCREF 
on the Document for each Element does the trick, as it effectively removes 
the Document from the collectable cycle and lets the Element destructors 
decide when to let the Document refcount go down to 0. A fix in a pure GC 
system is substantially harder to make efficient.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread skip

Michael> "Therefore we decided to break such a cycle at an arbitrary
Michael> place, which doesn't sound too insane."

I trust "arbitrary" != "random"?

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Mark Shannon

Michael Foord wrote:

On 06/05/2011 18:26, Mark Shannon wrote:

Michael Foord wrote:

On 06/05/2011 17:51, Stefan Behnel wrote:

Mark Shannon, 06.05.2011 18:33:

s...@pobox.com wrote:

Antoine> Since we're sharing links, here's Matt Mackall's take:
Antoine>
http://www.selenic.com/pipermail/mercurial-devel/2011-May/031055.html


From that note:

1: You can't have meaningful destructors, because when destruction
happens is undefined. And going-out-of-scope destructors are 
extremely
useful. Python is already a rather broken in this regard, so feel 
free

to ignore this point.

Given the presence of cyclic data I don't see how reference 
counting or
garbage collection win. Ignoring the fact that in a pure reference 
counted
system you won't even consider cycles for reclmation, would both 
RC and GC
have to punt because they can't tell which object's destructor to 
call

first?

It doesn't matter which is called first.

May I quote you on that one the next time my software crashes?

Arbitrarily breaking cycles *could* cause a problem if a destructor 
attempts to access an already collected object. Not breaking cycles 
*definitely* leaks memory and definitely doesn't call finalizers.
You don't need to break the cycles to call the finalizers. Just call 
them, then collect the whole cycle (assuming it is still unreachable).


The GC will *never* reclaim a reachable object. Objects awaiting 
finalization are reachable, by definition.



Well it was sloppily worded, so replace it with:

 if a finalizer attempts to access an already finalized object.


A finalized object will still be a valid object.
Python code cannot make an object unsafe.
Obviously C code can make it unsafe, but that's true of C code anywhere.

For example, a file object will close itself during finalization,
but its still a valid object, just a closed file rather than an open one.


Michael

Michael

It may not make a difference for the runtime, but the difference for 
user software may be "dead" or "alive".


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk







___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Michael Foord

On 06/05/2011 18:07, Glyph Lefkowitz wrote:

On May 6, 2011, at 12:31 PM, Michael Foord wrote:

pypy and .NET choose to arbitrarily break cycles rather than leave 
objects unfinalised and memory unreclaimed. Not sure what Java does.


I think that's a mischaracterization of their respective collectors; 
"arbitrarily break cycles" implies that user code would see broken or 
incomplete objects, at least during finalization, which I'm fairly 
sure is not true on either .NET or PyPy.


http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-1.html

"Therefore we decided to break such a cycle at an arbitrary place, which 
doesn't sound too insane."


All the best,

Michael Foord


Java definitely has a collector that can handles cycles too.  (None of 
these are reference counting.)


-glyph



--
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Stephen J. Turnbull
Mark Shannon writes:
 > 
 > 
 > Neal Becker wrote:
 > > http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html
 > > 
 > Being famous does not necessarily make you right.

No, but being a genius sure helps you beat the odds.

 > OS kernels are pretty atypical software,
 > even if Linus is right about Linux, it doesn't apply to Python.

Well, actually he was writing about GCC

 > I have empirical evidence, not opinion, that PyPy and my own HotPy
 > are a *lot* faster (x5 or better) on Unladen Swallow's gcbench benchmark 
 > (which stresses the memory management subsystem).

You're missing Linus's point, I think.  Linus did *not* claim that
it's impossible to write a fast *GC*.  He claimed that it's hard to
write a fast *program* that uses GC for memory management.  A
benchmark that stresses *only* the memory management system is
unlikely to impress him.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Glyph Lefkowitz
On May 6, 2011, at 12:31 PM, Michael Foord wrote:

> pypy and .NET choose to arbitrarily break cycles rather than leave objects 
> unfinalised and memory unreclaimed. Not sure what Java does.

I think that's a mischaracterization of their respective collectors; 
"arbitrarily break cycles" implies that user code would see broken or 
incomplete objects, at least during finalization, which I'm fairly sure is not 
true on either .NET or PyPy.

Java definitely has a collector that can handles cycles too.  (None of these 
are reference counting.)

-glyph___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Michael Foord

On 06/05/2011 17:51, Stefan Behnel wrote:

Mark Shannon, 06.05.2011 18:33:

s...@pobox.com wrote:

Antoine> Since we're sharing links, here's Matt Mackall's take:
Antoine>
http://www.selenic.com/pipermail/mercurial-devel/2011-May/031055.html


From that note:


1: You can't have meaningful destructors, because when destruction
happens is undefined. And going-out-of-scope destructors are extremely
useful. Python is already a rather broken in this regard, so feel free
to ignore this point.

Given the presence of cyclic data I don't see how reference counting or
garbage collection win. Ignoring the fact that in a pure reference 
counted
system you won't even consider cycles for reclmation, would both RC 
and GC

have to punt because they can't tell which object's destructor to call
first?


It doesn't matter which is called first.


May I quote you on that one the next time my software crashes?



Arbitrarily breaking cycles *could* cause a problem if a destructor 
attempts to access an already collected object. Not breaking cycles 
*definitely* leaks memory and definitely doesn't call finalizers.


Michael

It may not make a difference for the runtime, but the difference for 
user software may be "dead" or "alive".


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Michael Foord

On 06/05/2011 17:32, Gregory P. Smith wrote:

On Fri, May 6, 2011 at 9:18 AM,  wrote:

Antoine>  Since we're sharing links, here's Matt Mackall's take:
Antoine>  
http://www.selenic.com/pipermail/mercurial-devel/2011-May/031055.html

> From that note:

1: You can't have meaningful destructors, because when destruction
happens is undefined. And going-out-of-scope destructors are extremely
useful. Python is already a rather broken in this regard, so feel free
to ignore this point.

Python being "broken" in this regard is pretty much exactly why
__enter__, __exit__ and with as context managers were added to the
language.



How does that help with cycles? Sure it makes cleaning up some resources 
easier, but not at all this case. Explicit destruction is of course 
always an alternative to the runtime doing it for you, but it doesn't 
help with (for example) reclaiming memory. For long running processes 
memory leaks due to unreclaimable cycles can be a problem with CPython.



That gives the ability to have the equivalent of well defined nested
scopes that destroy something (exit) deterministically much as it is
easy to do in C++ with some {}s and a ~destructor().

It is not broken, just different.


+1 QOTW ;-)

Michael

-gps
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Stefan Behnel

Mark Shannon, 06.05.2011 18:33:

s...@pobox.com wrote:

Antoine> Since we're sharing links, here's Matt Mackall's take:
Antoine>
http://www.selenic.com/pipermail/mercurial-devel/2011-May/031055.html


From that note:


1: You can't have meaningful destructors, because when destruction
happens is undefined. And going-out-of-scope destructors are extremely
useful. Python is already a rather broken in this regard, so feel free
to ignore this point.

Given the presence of cyclic data I don't see how reference counting or
garbage collection win. Ignoring the fact that in a pure reference counted
system you won't even consider cycles for reclmation, would both RC and GC
have to punt because they can't tell which object's destructor to call
first?


It doesn't matter which is called first.


May I quote you on that one the next time my software crashes?

It may not make a difference for the runtime, but the difference for user 
software may be "dead" or "alive".


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Mark Shannon

s...@pobox.com wrote:

Antoine> Since we're sharing links, here's Matt Mackall's take:
Antoine> 
http://www.selenic.com/pipermail/mercurial-devel/2011-May/031055.html


From that note:


1: You can't have meaningful destructors, because when destruction
happens is undefined. And going-out-of-scope destructors are extremely
useful. Python is already a rather broken in this regard, so feel free
to ignore this point.

Given the presence of cyclic data I don't see how reference counting or
garbage collection win.  Ignoring the fact that in a pure reference counted
system you won't even consider cycles for reclmation, would both RC and GC
have to punt because they can't tell which object's destructor to call
first?


It doesn't matter which is called first.
In fact, the VM could call all the destructors at the same time if the 
machine has enough cores and there's no GIL.


All objects are kept alive by the GC until after the destructors are 
called. Those that are still dead will have their memory reclaimed.




Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/marks%40dcs.gla.ac.uk


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Gregory P. Smith
On Fri, May 6, 2011 at 9:18 AM,   wrote:
>
>    Antoine> Since we're sharing links, here's Matt Mackall's take:
>    Antoine> 
> http://www.selenic.com/pipermail/mercurial-devel/2011-May/031055.html
>
> >From that note:
>
>    1: You can't have meaningful destructors, because when destruction
>    happens is undefined. And going-out-of-scope destructors are extremely
>    useful. Python is already a rather broken in this regard, so feel free
>    to ignore this point.

Python being "broken" in this regard is pretty much exactly why
__enter__, __exit__ and with as context managers were added to the
language.

That gives the ability to have the equivalent of well defined nested
scopes that destroy something (exit) deterministically much as it is
easy to do in C++ with some {}s and a ~destructor().

It is not broken, just different.

-gps
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Michael Foord

On 06/05/2011 17:18, s...@pobox.com wrote:

 Antoine>  Since we're sharing links, here's Matt Mackall's take:
 Antoine>  
http://www.selenic.com/pipermail/mercurial-devel/2011-May/031055.html

> From that note:

 1: You can't have meaningful destructors, because when destruction
 happens is undefined. And going-out-of-scope destructors are extremely
 useful. Python is already a rather broken in this regard, so feel free
 to ignore this point.

Given the presence of cyclic data I don't see how reference counting or
garbage collection win.  Ignoring the fact that in a pure reference counted
system you won't even consider cycles for reclmation, would both RC and GC
have to punt because they can't tell which object's destructor to call
first?


pypy and .NET choose to arbitrarily break cycles rather than leave 
objects unfinalised and memory unreclaimed. Not sure what Java does.


All the best,

Michael Foord


Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread skip

Antoine> Since we're sharing links, here's Matt Mackall's take:
Antoine> 
http://www.selenic.com/pipermail/mercurial-devel/2011-May/031055.html

>From that note:

1: You can't have meaningful destructors, because when destruction
happens is undefined. And going-out-of-scope destructors are extremely
useful. Python is already a rather broken in this regard, so feel free
to ignore this point.

Given the presence of cyclic data I don't see how reference counting or
garbage collection win.  Ignoring the fact that in a pure reference counted
system you won't even consider cycles for reclmation, would both RC and GC
have to punt because they can't tell which object's destructor to call
first?

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Antoine Pitrou
On Fri, 06 May 2011 15:46:08 +0100
Mark Shannon  wrote:
> 
> Neal Becker wrote:
> > http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html
> > 
> Being famous does not necessarily make you right.
> 
> OS kernels are pretty atypical software,
> even if Linus is right about Linux, it doesn't apply to Python.
> 
> I have empirical evidence, not opinion, that PyPy and my own HotPy
> are a *lot* faster (x5 or better) on Unladen Swallow's gcbench benchmark 
> (which stresses the memory management subsystem).
> 
> (Note that gcbench does not introduce any cycles, so its being easy on 
> CPython)
> 
> In fact, for gcbench CPython spends over twice as long in the 
> cycle-collector as HotPy takes in total!

The thing is, it would be easy to change our collection heuristics
so that the cycle collector gets called less often (actually, you can
already do so using gc.set_threshold, IIRC). Something which is much
more delicate for a "full" GC, where it would grow memory consumption a
lot.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Mark Shannon



Neal Becker wrote:

http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html


Being famous does not necessarily make you right.

OS kernels are pretty atypical software,
even if Linus is right about Linux, it doesn't apply to Python.

I have empirical evidence, not opinion, that PyPy and my own HotPy
are a *lot* faster (x5 or better) on Unladen Swallow's gcbench benchmark 
(which stresses the memory management subsystem).


(Note that gcbench does not introduce any cycles, so its being easy on 
CPython)


In fact, for gcbench CPython spends over twice as long in the 
cycle-collector as HotPy takes in total!

I don't have such detailed results for PyPy.

For other benchmarks, the HotPy GC times are often smaller than the 
inter-run variations in runtime, for example:


HotPy GC stats for pystones (on a slow machine with a small cache):

Total memory allocated: 20 Mbytes.
20 minor collections, 0 major collections
Max heap size 2.4 Mbytes.
Total time spent in GC: 3.5 milliseconds. ( <1% of execution time)

My GC is quick, but its not the fastest.

Evidence trumps opinion IMHO ;)

Cheers,
Mark.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/marks%40dcs.gla.ac.uk


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Linus on garbage collection

2011-05-06 Thread Antoine Pitrou
On Fri, 06 May 2011 10:04:09 -0400
Neal Becker  wrote:
> http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html

Since we're sharing links, here's Matt Mackall's take:
http://www.selenic.com/pipermail/mercurial-devel/2011-May/031055.html

cheers

Antoine.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Linus on garbage collection

2011-05-06 Thread Neal Becker
http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com