On 9/13/07, Greg Ewing [EMAIL PROTECTED] wrote:
[EMAIL PROTECTED] wrote:
what if ... we use atomic test-and-set to
handle reference counting (with a lock for those CPU architectures where we
haven't written the necessary assembler fragment), then implement a lock for
each mutable type and
Hrvoje Nikšić wrote:
On Thu, 2007-09-13 at 13:15 +0200, Martin v. Löwis wrote:
To put it another way, would it actually matter if the reference
counts for such objects became hopelessly wrong due to non-atomic
adjustments?
If they drop to zero (which may happen due to non-atomic adjustments),
On 9/14/07, Adam Olsen [EMAIL PROTECTED] wrote:
Could be worth a try. A first step might be to just implement
the atomic refcounting, and run that single-threaded to see
if it has terribly bad effects on performance.
I've done this experiment. It was about 12% on my box. Later, once I
On Thu, 2007-09-13 at 18:38 -0500, [EMAIL PROTECTED] wrote:
Hrvoje More precisely, Python will call the deallocator appropriate for
Hrvoje the object type. If that deallocator does nothing, the object
Hrvoje continues to live. Such objects could also start out with a
Hrvoje
On 9/14/07, Justin Tulloss [EMAIL PROTECTED] wrote:
On 9/14/07, Adam Olsen [EMAIL PROTECTED] wrote:
Could be worth a try. A first step might be to just implement
the atomic refcounting, and run that single-threaded to see
if it has terribly bad effects on performance.
I've done this
At 1:51 AM -0500 9/14/07, Justin Tulloss wrote:
On 9/14/07, Adam Olsen mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote:
Could be worth a try. A first step might be to just implement
the atomic refcounting, and run that single-threaded to see
if it has terribly bad effects on performance.
I've
Your idea can be combined with the maxint/2 initial refcount for
non-disposable objects, which should about eliminate thread-count updates
for them.
--
I don't really like the maxint/2 idea because it requires us to
differentiate between globals and everything else. Plus, it's a hack. I'd
On Fri, 14 Sep 2007 14:13:47 -0500, Justin Tulloss [EMAIL PROTECTED] wrote:
Your idea can be combined with the maxint/2 initial refcount for
non-disposable objects, which should about eliminate thread-count updates
for them.
--
I don't really like the maxint/2 idea because it requires us to
Jean-Paul Calderone wrote:
On Fri, 14 Sep 2007 14:13:47 -0500, Justin Tulloss [EMAIL PROTECTED] wrote:
Your idea can be combined with the maxint/2 initial refcount for
non-disposable objects, which should about eliminate thread-count updates
for them.
--
I don't really like the maxint/2
On Fri, 14 Sep 2007 17:43:39 -0400, James Y Knight [EMAIL PROTECTED] wrote:
On Sep 14, 2007, at 3:30 PM, Jean-Paul Calderone wrote:
On Fri, 14 Sep 2007 14:13:47 -0500, Justin Tulloss [EMAIL PROTECTED]
wrote:
Your idea can be combined with the maxint/2 initial refcount for
non-disposable
On 9/14/07, Jean-Paul Calderone [EMAIL PROTECTED] wrote:
On Fri, 14 Sep 2007 17:43:39 -0400, James Y Knight [EMAIL PROTECTED] wrote:
On Sep 14, 2007, at 3:30 PM, Jean-Paul Calderone wrote:
On Fri, 14 Sep 2007 14:13:47 -0500, Justin Tulloss [EMAIL PROTECTED]
wrote:
Your idea can be combined
Justin Tulloss wrote:
What do you think of a model where there is a global
thread count that keeps track of how many threads reference an object?
I've thought about that sort of thing before. The problem
is how you keep track of how many threads reference an
object, without introducing far
Adam Olsen wrote:
I'm now working on an approach that writes out refcounts in batches to
reduce contention. The initial cost is much higher, but it scales
better too. I've currently got it to just under 50% cost, meaning two
threads is a slight net gain.
What do you think?
I'm going to have to agree with Martin here, although I'm not sure I
understand what you're saying entirely. Perhaps if you explained where the
benefits of this approach come from, it would clear up what you're thinking.
After a few days of thought, I'm starting to realize
Phillip J. Eby wrote:
It's not just caches and counters. It's also every built-in type
structure, builtin module, builtin function... any Python object
that's a built-in, period. That includes things like None, True, and False.
Caches would include such things as the pre-created
* Christian Heimes wrote:
Pardon my ignorance but why does Python do reference counting for truly
global and static objects like None, True, False, small and cached
integers, sys and other builtins? If I understand it correctly these
objects are never garbaged collected (at least they
On Thu, Sep 13, 2007 at 12:19:21PM +0200, André Malo wrote:
Pardon my ignorance but why does Python do reference counting for truly
global and static objects like None, True, False, small and cached
integers, sys and other builtins? If I understand it correctly these
objects are never
To put it another way, would it actually matter if the reference
counts for such objects became hopelessly wrong due to non-atomic
adjustments?
If they drop to zero (which may happen due to non-atomic adjustments),
Python will try to release the static memory, which will crash the
malloc
On Thu, Sep 13, 2007 at 01:15:39PM +0200, Martin v. Löwis wrote:
To put it another way, would it actually matter if the reference
counts for such objects became hopelessly wrong due to non-atomic
adjustments?
If they drop to zero (which may happen due to non-atomic adjustments),
Python
Jon To put it another way, would it actually matter if the reference
Jon counts for such objects became hopelessly wrong due to non-atomic
Jon adjustments?
I believe this was suggested and tried by someone (within the last few
years). It wasn't any benefit. The costs of
On Thu, 2007-09-13 at 13:15 +0200, Martin v. Löwis wrote:
To put it another way, would it actually matter if the reference
counts for such objects became hopelessly wrong due to non-atomic
adjustments?
If they drop to zero (which may happen due to non-atomic adjustments),
Python will try
On Sep 13, 2007, at 10:12 AM, Martin v. Löwis wrote:
What do you think?
I think what you are describing is the situation of today,
except in a less-performant way. The kernel *already*
implements such a synchronization server, except that
all CPUs can act as such. You write
Since we are
Since we are guaranteeing that synchronized code is running on a single
core, it is the equivalent of a lock at the cost of a context switch.
This is precisely what a lock costs today: a context switch.
Really? Wouldn't we save some memory allocation overhead (since in my
design, the lock
On Sep 13, 2007, at 9:25 PM, Martin v. Löwis wrote:
Since we are guaranteeing that synchronized code is running on a
single
core, it is the equivalent of a lock at the cost of a context
switch.
This is precisely what a lock costs today: a context switch.
Really? Wouldn't we save
http://www.artima.com/weblogs/viewpost.jsp?thread=214235) that the
slowdown was 2x in a single threaded application (which couldn't be due
to lock contention), it must be due to lock overhead (unless the
programming was otherwise faulty or there is something else about locks
that I don't know
On 9/13/07, Hrvoje Nikšić [EMAIL PROTECTED] wrote:
On Thu, 2007-09-13 at 13:15 +0200, Martin v. Löwis wrote:
To put it another way, would it actually matter if the reference
counts for such objects became hopelessly wrong due to non-atomic
adjustments?
If they drop to zero (which may
On 9/13/07, Justin Tulloss [EMAIL PROTECTED] wrote:
1. Use message passing and transactions. [...]
2. Do it perl style. [...]
3. Come up with an elegant way of handling multiple python processes. [...]
4. Remove the GIL, use transactions for python objects, [...]
The SpiderMonkey JavaScript
On 9/13/07, Justin Tulloss [EMAIL PROTECTED] wrote:
On 9/13/07, Adam Olsen [EMAIL PROTECTED] wrote:
Basically though, atomic incref/decref won't work. Once you've got
two threads modifying the same location the costs skyrocket. Even
without being properly atomic you'll get the same
On 9/13/07, Jason Orendorff [EMAIL PROTECTED] wrote:
On 9/13/07, Justin Tulloss [EMAIL PROTECTED] wrote:
1. Use message passing and transactions. [...]
2. Do it perl style. [...]
3. Come up with an elegant way of handling multiple python processes.
[...]
4. Remove the GIL, use
On 9/13/07, Adam Olsen [EMAIL PROTECTED] wrote:
Basically though, atomic incref/decref won't work. Once you've got
two threads modifying the same location the costs skyrocket. Even
without being properly atomic you'll get the same slowdown on x86
(who's cache coherency is fairly strict.)
Martin v. Löwis wrote:
Now we are getting into details: you do NOT have to lock
an object to modify its reference count. An atomic
increment/decrement operation is enough.
I stand corrected. But if it were as simple as that,
I think it would have been done by now. I got the
impression that
Hrvoje More precisely, Python will call the deallocator appropriate for
Hrvoje the object type. If that deallocator does nothing, the object
Hrvoje continues to live. Such objects could also start out with a
Hrvoje refcount of sys.maxint or so to ensure that calls to the no-op
On Thu, Sep 13, 2007 at 06:38:05PM -0500, [EMAIL PROTECTED] wrote:
Hrvoje More precisely, Python will call the deallocator appropriate for
Hrvoje the object type. If that deallocator does nothing, the object
Hrvoje continues to live. Such objects could also start out with a
Jon Ribbens wrote:
To put it another way, would it actually matter if the reference
counts for such objects became hopelessly wrong due to non-atomic
adjustments?
Again, it would cost time to check whether you could
get away with doing non-atomic refcounting.
If you're thinking that no check
[EMAIL PROTECTED] wrote:
what if ... we use atomic test-and-set to
handle reference counting (with a lock for those CPU architectures where we
haven't written the necessary assembler fragment), then implement a lock for
each mutable type and another for global state (thread state, interpreter
Prateek Sureka wrote:
Naturally, we need to make the locking more
fine-grained to resolve this. Hopefully we can do so in a way that
does not increase the lock overhead (hence my suggestion for a lock
free approach using an asynch queue and a core as dedicated server).
What you don't
Jason Orendorff wrote:
The clever bit is that SpiderMonkey's per-object
locking does *not* require a context switch or even an atomic
instruction, in the usual case where an object is *not* shared among
threads.
How does it tell whether an object is shared between
threads? That sounds like
Pardon me for talking with no experience in such matters, but...
Okay, incrementing a reference counter is atomic, therefore the cheapest
possible operation. Is it possible to keep reference counting atomic in a
multi-thread model?
Could you do the following... let's consider two threads, A and
On 9/13/07, Greg Ewing [EMAIL PROTECTED] wrote:
Jason Orendorff wrote:
The clever bit is that SpiderMonkey's per-object
locking does *not* require a context switch or even an atomic
instruction, in the usual case where an object is *not* shared among
threads.
How does it tell whether
But this has been raised before, and was rejected as not worth the
amount of work that would be required to achieve it.
In my understanding, there is an important difference between
it was rejected, and it was not done.
Regards,
Martin
___
On 9/12/07, Martin v. Löwis [EMAIL PROTECTED] wrote:
Now we are getting into details: you do NOT have to lock
an object to modify its reference count. An atomic
increment/decrement operation is enough.
One could measure the performance hit incurred by using atomic
operations for refcounting by
Brett We should probably document where all of these globals lists are
Brett instead of relying on looking for all file level static
Brett declarations or something.
I smell a wiki page.
Skip
Brett Or would there be benefit to moving things like this to the
Brett
Martin Now we are getting into details: you do NOT have to lock an
Martin object to modify its reference count. An atomic
Martin increment/decrement operation is enough.
Implemented in asm I suspect? For common CPUs this could just be part of
the normal Python distribution. For
I was reading GvR's post on this and came up with a theory on how to
tackle the problem.
I ended up putting it in a blog post.
http://www.brainwavelive.com/blog/index.php?/archives/12-Suggestion-
for-removing-the-Python-Global-Interpreter-Lock.html
What do you think?
Prateek
On Sep 12,
What do you think?
I think what you are describing is the situation of today,
except in a less-performant way. The kernel *already*
implements such a synchronization server, except that
all CPUs can act as such. You write
Since we are guaranteeing that synchronized code is running on a single
Hi,
I had a whole long email about exactly what I was doing, but I think I'll
get to the point instead. I'm trying to implement a python concurrency API
and would like to use cpython to do it. To do that, I would like to remove
the GIL.
So, since I'm new to interpreter hacking, some help would
1. Some global interpreter state/modules are protected (where are these
globals at?)
It's the interpreter and thread state itself (pystate.h), for the thread
state, also _PyThreadState_Current. Then there is the GC state, in
particular generations. There are various caches and counters also.
On 9/11/07, Martin v. Löwis [EMAIL PROTECTED] wrote:
1. Some global interpreter state/modules are protected (where are these
globals at?)
It's the interpreter and thread state itself (pystate.h), for the thread
state, also _PyThreadState_Current. Then there is the GC state, in
particular
Justin Caches seem like they definitely might be a problem. Would you
Justin mind expanding on this a little? What gets cached and why?
I believe the integer free list falls into this category.
Skip
___
Python-Dev mailing list
It's the interpreter and thread state itself (pystate.h), for the thread
state, also _PyThreadState_Current. Then there is the GC state, in
particular generations. There are various caches and counters also.
Caches seem like they definitely might be a problem. Would you mind
At 10:07 AM 9/11/2007 -0500, Justin Tulloss wrote:
On 9/11/07, Martin v. Löwis
mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote:
1. Some global interpreter state/modules are protected (where are these
globals at?)
It's the interpreter and thread state itself (pystate.h), for the thread
On 9/11/07, Martin v. Löwis [EMAIL PROTECTED] wrote:
It's the interpreter and thread state itself (pystate.h), for the thread
state, also _PyThreadState_Current. Then there is the GC state, in
particular generations. There are various caches and counters also.
Caches seem
It's not just caches and counters. It's also every built-in type
structure, builtin module, builtin function... any Python object that's
a built-in, period. That includes things like None, True, and False.
Sure - but those things don't get modified that often, except for their
reference
On Sep 11, 2007, at 3:30 PM, Brett Cannon wrote:
We should probably document where all of these globals lists are
instead of relying on looking for all file level static declarations
or something. Or would there be benefit to moving things like this to
the interpreter struct so that threads
We should probably document where all of these globals lists are
instead of relying on looking for all file level static declarations
or something.
I'm not sure what would be gained here, except for people occasionally
(i.e. every three years) asking how they can best get rid of the GIL.
Or
Phillip J. Eby wrote:
It's also every built-in type
structure, builtin module, builtin function... any Python object
that's a built-in, period.
Where built-in in this context means anything implemented
in C (i.e. it includes extension modules).
--
Greg
Martin v. Löwis wrote:
Sure - but those things don't get modified that often, except for their
reference count.
The reference count is the killer, though -- you have
to lock the object even to do that. And it happens
a LOT, to all objects, including immutable ones.
--
Greg
57 matches
Mail list logo