Re: [Python-Dev] Reworking the GIL

2010-02-02 Thread Stephen J. Turnbull
Antoine Pitrou writes:

 > Yes, I hindsight I think Guido was right.

Guido does too.  One of the benefits of having a time machine is
getting to turn your hindsight into foresight.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2010-02-02 Thread Larry Hastings

Antoine Pitrou wrote:

Also, the MSDN doc (*) says timeBeginPeriod() can have a detrimental effect on
system performance; I don't know how much of it is true.

(*) http://msdn.microsoft.com/en-us/library/dd757624(VS.85).aspx


Indeed it does.  This is ancient, dusty wisdom, from the days of 50mhz 
computers, passed down from OS generation to generation.  As a (former) 
professional Windows developer I can assure you it was no longer a 
concern even ten years ago; its performance effect was so small as to be 
unmeasurable.


You may call timeBeginPeriod(1) with impunity,


/larry/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2010-02-02 Thread Antoine Pitrou

Hello Eric,

> I notice that Guido vetoed this idea, but just in case it should come up
> again, I have some thoughts that likely have already occurred to people,
> but I didn't notice on the list.

Yes, I hindsight I think Guido was right.
(no, I'm not trying to make myself agreeable :-))

> Also, has anybody thought of starting work on making a set of Py* calls
> that can be made without having the GIL acquired?

In the current state of affairs it would be quite difficult. Each memory
allocation can trigger a garbage collection which itself needs the GIL. And
Python is full of memory allocations.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2010-02-02 Thread Neil Hodgson
Eric Hopper:

> I don't suppose it will ever be ported back to Python 2.x?  It doesn't
> look like the whole GIL concept has changed much between Python 2.x and
> 3.x so I expect back-porting it would be pretty easy.

   There was a patch but it has been rejected.
http://bugs.python.org/issue7753

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2010-02-02 Thread Eric Hopper
I recently saw the video of David Beazley's presentation on how poorly
the old GIL implementation handled certain cases and thought "I can fix
that!".  Unfortunately for me, someone else has beaten me to it, and
done a somewhat better job than I would've because I wasn't thinking of
doing anything about changing how ticks worked.

Antoine Pitrou wrote:
> - priority requests, which is an option for a thread requesting the
> GIL to be scheduled as soon as possible, and forcibly (rather than any
> other threads). This is meant to be used by GIL-releasing methods such
> as read() on files and sockets. The scheme, again, is very simple:
> when a priority request is done by a thread, the GIL is released as
> soon as possible by the thread holding it (including in the eval
> loop), and then the thread making the priority request is forcibly
> scheduled (by making all other GIL-awaiting threads wait in the
> meantime).

I notice that Guido vetoed this idea, but just in case it should come up
again, I have some thoughts that likely have already occurred to people,
but I didn't notice on the list.

I would be very careful with this.  The goal should be to avoid
re-implementing a scheduler of any kind in Python.  This comes
perilously close.  In my opinion this mechanism should _only_ be used
for signals.

I think implementing this for IO is fraught with peril and would almost
certainly require a scheduler to be implemented in Python in order to
work better than not having it at all while also avoiding pathological
cases.

Also, has anybody thought of starting work on making a set of Py* calls
that can be made without having the GIL acquired?

I don't suppose it will ever be ported back to Python 2.x?  It doesn't
look like the whole GIL concept has changed much between Python 2.x and
3.x so I expect back-porting it would be pretty easy.

This discussion is, of course, rather old.  But I browsed through all
the stuff people said in reply and I (perhaps foolishly) thought my two
cents might be helpful.


-- 
A word is nothing more or less than the series of historical
connotations given to it. That's HOW we derive meaning, and to claim
that there is an arbitrary meaning of words above and beyond the way
people use them is a blatant misunderstanding of the nature of language.
-- Anonymous blogger
-- Eric Hopper (hop...@omnifarious.org http://www.omnifarious.org/~hopper)--


signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-23 Thread Stefan Ring
> I built something very similar for my company last year, and it’s been running
> flawlessly in production at a few customer sites since, with avg. CPU usage 
> ~50%
> around the clock. I even posted about it on the Python mailing list [1] where
> there was almost no resonance at that time. I never posted code, though --
> nobody seemed to be too interested.

I've never bothered to make this tidy and nice, especially the
function naming (PySpecial_*) leaves some things to be desired. It's
not too bad, though; it just doesn't have commit-ready quality. I
don't worry about this anymore, so I just post what I have. Maybe
someone can make use of it.
--- Python-2.5.2/Include/pythread.h.scheduling	2006-06-13 17:04:24.0 +0200
+++ Python-2.5.2/Include/pythread.h	2008-10-16 14:46:07.0 +0200
@@ -40,6 +40,20 @@
 PyAPI_FUNC(void *) PyThread_get_key_value(int);
 PyAPI_FUNC(void) PyThread_delete_key_value(int key);
 
+#ifndef _POSIX_THREADS
+#error Requires POSIX threads
+#endif
+
+PyAPI_FUNC(void *) PyThread_mutex_alloc(void);
+PyAPI_FUNC(void) PyThread_mutex_free(void *);
+PyAPI_FUNC(void) PyThread_mutex_lock(void *);
+PyAPI_FUNC(void) PyThread_mutex_unlock(void *);
+
+PyAPI_FUNC(void *) PyThread_cond_alloc(void);
+PyAPI_FUNC(void) PyThread_cond_free(void *);
+PyAPI_FUNC(void) PyThread_cond_wait(void *, void *);
+PyAPI_FUNC(void) PyThread_cond_signal(void *);
+
 #ifdef __cplusplus
 }
 #endif
--- Python-2.5.2/Python/thread.c.scheduling	2006-07-21 09:59:47.0 +0200
+++ Python-2.5.2/Python/thread.c	2008-10-16 14:46:07.0 +0200
@@ -155,6 +155,56 @@
 #endif
 */
 
+void *PyThread_mutex_alloc(void)
+{
+	pthread_mutex_t *m = malloc(sizeof(pthread_mutex_t));
+	if (pthread_mutex_init(m, NULL))
+		Py_FatalError("PyThread_mutex_alloc: pthread_mutex_init failed");
+	return m;
+}
+
+void PyThread_mutex_free(void *m)
+{
+	if (pthread_mutex_destroy(m))
+		Py_FatalError("PyThread_mutex_free: pthread_mutex_destroy failed");
+	free(m);
+}
+
+void PyThread_mutex_lock(void *m)
+{
+	pthread_mutex_lock(m);
+}
+
+void PyThread_mutex_unlock(void *m)
+{
+	pthread_mutex_unlock(m);
+}
+
+void *PyThread_cond_alloc(void)
+{
+	pthread_cond_t *c = malloc(sizeof(pthread_cond_t));
+	if (pthread_cond_init(c, NULL))
+		Py_FatalError("PyThread_cond_alloc: pthread_cond_init failed");
+	return c;
+}
+
+void PyThread_cond_free(void *c)
+{
+	if (pthread_cond_destroy(c))
+		Py_FatalError("PyThread_cond_free: pthread_cond_destroy failed");
+	free(c);
+}
+
+void PyThread_cond_wait(void *c, void *m)
+{
+	pthread_cond_wait(c, m);
+}
+
+void PyThread_cond_signal(void *c)
+{
+	pthread_cond_signal(c);
+}
+
 /* return the current thread stack size */
 size_t
 PyThread_get_stacksize(void)
--- Python-2.5.2/Python/ceval.c.scheduling	2008-01-23 21:09:39.0 +0100
+++ Python-2.5.2/Python/ceval.c	2008-10-16 14:47:07.0 +0200
@@ -210,7 +210,31 @@
 #endif
 #include "pythread.h"
 
-static PyThread_type_lock interpreter_lock = 0; /* This is the GIL */
+typedef void *PySpecial_cond_type;
+
+struct special_linkstruct {
+	PySpecial_cond_type wait;
+	struct special_linkstruct *queue_next, *free_next;
+	int in_use;
+};
+
+typedef void *PySpecial_lock_type;
+
+typedef struct {
+	PySpecial_lock_type the_lock;
+	struct special_linkstruct *wait_queue, *wait_last, *free_queue;
+} PySpecialSemaphore;
+
+void
+PySpecial_init(PySpecialSemaphore *s)
+{
+	s->the_lock = PyThread_mutex_alloc();
+	s->wait_queue = NULL;
+	s->wait_last = NULL;
+	s->free_queue = NULL;
+}
+
+static PySpecialSemaphore *interpreter_lock = NULL; /* This is the GIL */
 static long main_thread = 0;
 
 int
@@ -219,26 +243,100 @@
 	return interpreter_lock != 0;
 }
 
+static PySpecialSemaphore *allocate_special(void)
+{
+	PySpecialSemaphore *s = malloc(sizeof(PySpecialSemaphore));
+	PySpecial_init(s);
+	return s;
+}
+
+static struct special_linkstruct *allocate_special_linkstruct(void)
+{
+	struct special_linkstruct *ls = malloc(sizeof(struct special_linkstruct));
+	ls->wait = PyThread_cond_alloc();
+	ls->queue_next = NULL;
+	ls->free_next = NULL;
+	ls->in_use = 0;
+	return ls;
+}
+
+static void PySpecial_Lock(PySpecialSemaphore *s)
+{
+	struct special_linkstruct *ls;
+
+	PyThread_mutex_lock(s->the_lock);
+
+	if (!s->free_queue)
+		s->free_queue = allocate_special_linkstruct();
+
+	ls = s->free_queue;
+	s->free_queue = ls->free_next;
+
+	if (!s->wait_queue)
+	{
+		ls->in_use = 1;
+		s->wait_queue = ls;
+		s->wait_last = ls;
+		PyThread_mutex_unlock(s->the_lock);
+		return;
+	}
+
+	assert(s->wait_queue != ls);
+	assert(s->wait_last != ls);
+	assert(s->wait_last->queue_next == NULL);
+	assert(!ls->in_use);
+	s->wait_last->queue_next = ls;
+	s->wait_last = ls;
+	ls->in_use = 1;
+
+	while (s->wait_queue != ls)
+		PyThread_cond_wait(ls->wait, s->the_lock);
+
+	PyThread_mutex_unlock(s->the_lock);
+}
+
+static void PySpecial_Unlock(PySpecialSemaphore *s)
+{
+	struct special_linkstruct *ls;
+
+	PyThread_mutex_lock(s->the_lock);
+	ls = s->wait_queue;
+	assert(ls->in_use);
+
+	s->wait_queu

Re: [Python-Dev] Reworking the GIL

2009-11-23 Thread Nick Coghlan
Stefan Ring wrote:
> [2] http://www.bestinclass.dk/index.php/2009/10/python-vs-clojure-evolving/
> [3] www.dabeaz.com/python/GIL.pdf
> 
> PS On a slightly different note, I came across some Python bashing [2] 
> yesterday
> and somehow from there to David Beazley’s presentation about the GIL [3]. 
> While
> I don’t mind the bashing, the observations about the GIL seem quite unfair to 
> me
> because David’s measurements have been made on Mac OS X with its horribly slow
> pthreads functions. I was not able to measure any slowdown on Linux.

We care about Mac OS X though, so even if the contention wasn't as bad
on a different OS, the Mac downsides matter.

With the GIL updates in place, it would be interesting to see that
analysis redone for 2.7/3.2 though.

Regards,
Nick.

P.S. As far as interest in the idea goes, the GIL is one of those areas
where it takes a fairly rare combination of interest, expertise and
established credibility to propose a change and get assent to it. You'll
notice that even Antoine had to resort to the "if nobody objects soon,
I'm checking this in" tactic to garner any responses. It's an area where
even those with relevant expertise still have to put aside a fair chunk
of time in order to properly review a proposed change :)

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-23 Thread Stefan Ring
Hello,

I built something very similar for my company last year, and it’s been running
flawlessly in production at a few customer sites since, with avg. CPU usage ~50%
around the clock. I even posted about it on the Python mailing list [1] where
there was almost no resonance at that time. I never posted code, though --
nobody seemed to be too interested.

I am well aware that your current work is a lot more far-reaching than what I’ve
done, which is basically just a FIFO scheduler. I even added scheduling
priorities later which don’t work too great because the amount of time used for
a "tick" can vary by several orders of magnitude, as you know.

Thought you might be interested.

Regards
Stefan

[1] http://mail.python.org/pipermail/python-dev/2008-March/077814.html
[2] http://www.bestinclass.dk/index.php/2009/10/python-vs-clojure-evolving/
[3] www.dabeaz.com/python/GIL.pdf

PS On a slightly different note, I came across some Python bashing [2] yesterday
and somehow from there to David Beazley’s presentation about the GIL [3]. While
I don’t mind the bashing, the observations about the GIL seem quite unfair to me
because David’s measurements have been made on Mac OS X with its horribly slow
pthreads functions. I was not able to measure any slowdown on Linux.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-10 Thread Antoine Pitrou

Hello again,

I've now removed priority requests, tried to improve the internal doc a 
bit, and merged the changes into py3k.

Afterwards, the new Windows 7 buildbot has hung in test_multiprocessing, 
but I don't know whether it's related.

Regards

Antoine.


Guido van Rossum  python.org> writes:
> 
> 
> I would remove them -- if and when we find there's a need for
> something like them I suspect it won't look like what you currently
> have, so it's better not to complexificate your current patch with
> them. (I would remove all traces, including the conditions passed into
> the end macros.)
> 
> >> My only suggestion so far: the description could use more explicit
> >> documentation on the various variables and macros and how they
> >> combine.
> >
> > Is it before or after
> > http://mail.python.org/pipermail/python-checkins/2009-
November/087482.html ?
> 
> After. While that is already really helpful, not all the code is
> easily linked back to paragraphs in that comment block, and some
> variables are not mentioned by name in the block.
>


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-08 Thread Antoine Pitrou
Baptiste Lepilleur  gmail.com> writes:
> 
> I've tried, but there is no change in result (the regexp does not use \w &
> co but specify a lot unicode ranges). All strings are already of unicode
> type in 2.6.

No they aren't. You should add "from __future__ import unicode_literals" at the
start of your script and run it again.

> Hmmm, I was confusing with other modules (bzip2 & hashlib?). Looking back at 
> the result of your benchmark it's obvious. Is there a place where the list of 
> functions releasing the GIL is available? I did not see anything in 
> bz2.compress documentation.

No, there isn't. You'd have to test, or read the source code.
But bz2 and zlib, for example, do release the GIL.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-08 Thread Baptiste Lepilleur
2009/11/7 Antoine Pitrou 

>
> Hello again,
>
> > It shows that, on my platform for this specific benchmark:
> >   *  newgil manage to leverage a significant amount of parallelism
> > (1.7) where python 3.1 does not (3.1 is 80% slower)
>
> I think you are mistaken:
>
> -j0 (main thread only)
> newgil: 47.483s, 47.605s, 47.512s
> -j4 (4 consumer threads, main thread producing/waiting)
> newgil: 48.428s, 49.217s
>
> The runtimes are actually the same, so newgil doesn't leverage anything.
> However, it doesn't degrade performance like 2.x/3.1 does :-)
>

Ooops, I was comparing to 3.1 -j4 times which make no sense. One would think
I wanted to see that result since I though the GIL was released :/. This
greatly reduce the interest of this benchmark...


> >   * 3.1 is more than 2 times slower than python 2.6 on this
> > benchmark
>
> That's the most worrying outcome I'd say. Are you sure the benchmark
> really does the same thing? Under 2.6, you should add re.UNICODE to the
> regular expression flags so as to match the 3.x semantics.
>

I've tried, but there is no change in result (the regexp does not use \w &
co but specify a lot unicode ranges). All strings are already of unicode
type in 2.6.


>  > [if I understood correctly in 3.x regex release the GIL].
>
> Unless I've missed something it doesn't, no.
>
Hmmm, I was confusing with other modules (bzip2 & hashlib?). Looking back at
the result of your benchmark it's obvious. Is there a place where the list
of functions releasing the GIL is available? I did not see anything in
bz2.compress documentation.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-07 Thread Antoine Pitrou

Hello again,

> It shows that, on my platform for this specific benchmark:
>   *  newgil manage to leverage a significant amount of parallelism
> (1.7) where python 3.1 does not (3.1 is 80% slower)

I think you are mistaken:

-j0 (main thread only)
newgil: 47.483s, 47.605s, 47.512s
-j4 (4 consumer threads, main thread producing/waiting)
newgil: 48.428s, 49.217s

The runtimes are actually the same, so newgil doesn't leverage anything.
However, it doesn't degrade performance like 2.x/3.1 does :-)

>   * newgil as a small impact on non multi-threaded execution
> (~1-2%) [may be worth investigating]

It goes from very slightly slower to very slightly faster and it is
likely to be caused by variations in generated output from the compiler.

>   * 3.1 is more than 2 times slower than python 2.6 on this
> benchmark

That's the most worrying outcome I'd say. Are you sure the benchmark
really does the same thing? Under 2.6, you should add re.UNICODE to the
regular expression flags so as to match the 3.x semantics.

> [if I understood correctly in 3.x regex release the GIL].

Unless I've missed something it doesn't, no.
This could be a separate opportunity for optimization, if someone wants
to take a look at it.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-07 Thread Baptiste Lepilleur
2009/11/7 Antoine Pitrou 

>
> [...]
> So, to sum it up, the way the current GIL manages to have good latencies is
> by
> issueing an unreasonable number of system calls on a contended lock, and
> potentially killing throughput performance (this depends on the OS too,
> because
> numbers under Linux are not so catastrophic).
>
Ah, I remember reading this in the analysis that was published now!

I made another benchmark using one of my script I ported to python 3 (and
simplified a bit). I only test the total execution time. Tests done on
Windows XP SP3. Processor is an Intel Core 2 Quad Q9300 (4 cores).

You can get the script from:
http://gaiacrtn.free.fr/py/benchmark-kwd-newgil/purekeyword-py26-3k.py
Script + test doc (940KB):
http://gaiacrtn.free.fr/py/benchmark-kwd-newgil/benchmark-kwd-newgil.tar.bz2

The threaded loop is:
for match in self.punctuation_pattern.finditer( document ):
word = document[last_start_index:match.start()]
if len(word) > 1 and len(word) < MAX_KEYWORD_LENGTH:
words[word] = words.get(word, 0) + 1
last_start_index = match.end()
if word:
word_count += 1

Here are the results:

-j0 (main thread only)
2.5.2 : 17.991s, 17.947s, 17.780s
2.6.2 : 19.071s, 19.023s, 19.054s
3.1.1 : 46.384s, 46.321s, 46.425s
newgil: 47.483s, 47.605s, 47.512s

-j4 (4 consumer threads, main thread producing/waiting)
2.5.2 : 31.105s, 30.568s
2.6.2 : 31.550s, 30.599s
3.1.1 : 85.114s, 85.185s
newgil: 48.428s, 49.217s

It shows that, on my platform for this specific benchmark:

   -  newgil manage to leverage a significant amount of parallelism (1.7)
   where python 3.1 does not (3.1 is 80% slower)
   - newgil as a small impact on non multi-threaded execution (~1-2%) [may
   be worth investigating]
   - 3.1 is more than 2 times slower than python 2.6 on this benchmark
   - 2.6 is "only" 65% slower when run with multiple threads compared to the
   80% slower of 3.1.

Newgil is a vast improvement as it manages to leverage the short time the
GIL is released by finditer [if I understood correctly in 3.x regex release
the GIL].

What's worry me is the single threaded performance degradation between 2.6
and 3.1 on this test. Could the added GIL release/acquire on each finditer
call explain this?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-07 Thread Guido van Rossum
On Sat, Nov 7, 2009 at 9:01 AM, Antoine Pitrou  wrote:
>
> Hello Guido,
>
>> How close are you to merging this into the Py3k branch? It looks like
>> a solid piece of work, that can only get better in the period between
>> now and the release of 3.2. But I don't want to rush you, and I only
>> have had a brief look at your code.
>
> The code is ready. Priority requests are already disabled, I'm just
> wondering whether to remove them from the code, or leave them there in
> case someone thinks they're useful. I suppose removing them is ok.

I would remove them -- if and when we find there's a need for
something like them I suspect it won't look like what you currently
have, so it's better not to complexificate your current patch with
them. (I would remove all traces, including the conditions passed into
the end macros.)

>> My only suggestion so far: the description could use more explicit
>> documentation on the various variables and macros and how they
>> combine.
>
> Is it before or after
> http://mail.python.org/pipermail/python-checkins/2009-November/087482.html ?

After. While that is already really helpful, not all the code is
easily linked back to paragraphs in that comment block, and some
variables are not mentioned by name in the block.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-07 Thread Antoine Pitrou

Hello Guido,

> How close are you to merging this into the Py3k branch? It looks like
> a solid piece of work, that can only get better in the period between
> now and the release of 3.2. But I don't want to rush you, and I only
> have had a brief look at your code.

The code is ready. Priority requests are already disabled, I'm just
wondering whether to remove them from the code, or leave them there in
case someone thinks they're useful. I suppose removing them is ok.

> My only suggestion so far: the description could use more explicit
> documentation on the various variables and macros and how they
> combine.

Is it before or after
http://mail.python.org/pipermail/python-checkins/2009-November/087482.html ?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-07 Thread Guido van Rossum
Antoine,

How close are you to merging this into the Py3k branch? It looks like
a solid piece of work, that can only get better in the period between
now and the release of 3.2. But I don't want to rush you, and I only
have had a brief look at your code. (I whipped up a small Dave Beazley
example and was impressed by the performance of your code compared to
the original py3k branch -- from 150% to 100% CPU usage according to
top with only 2 threads.)

My only suggestion so far: the description could use more explicit
documentation on the various variables and macros and how they
combine.

I also expect that priority requests aren't important; it somehow
seems strange that if multiple threads are all doing I/O each new
thread whose I/O completes would get to preempt whoever else is active
immediately. (Also the choice of *not* making a priority request when
a read returns no bytes seems strange 00 if I read the code
correctly.)

Anyway, thanks for this work!

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-07 Thread Antoine Pitrou

Hello,

> Solaris X86, 16 cores: some python extension are likely missing (see 
config.log)
> Windows XP SP3, 4 cores: all python extensions but TCL (I didn't bother
checking why it failed as it is not used in the benchmark). It is a release 
build.
> 
> The results look promising but I let you share your conclusion (some latency
results seem a bit strange from my understanding).

Thank you! The latency results don't look that strange to me.

If you're surprised that py3k shows better latency than newgil on the "pi
calculation" workload, it's easy to understand why: py3k speculatively releases
the GIL so often on that workload (every 100 opcodes) that the latencies are
indeed very good, but if you look at the corresponding throughput numbers they
are dismal (your Solaris box shows it falling to less than 20% with two threads
running compared to the baseline number for single-thread execution, and on your
Windows box the number is hardly better with 45%).

So, to sum it up, the way the current GIL manages to have good latencies is by
issueing an unreasonable number of system calls on a contended lock, and
potentially killing throughput performance (this depends on the OS too, because
numbers under Linux are not so catastrophic).

The new GIL, on the other hand, is much more balanced in that it achieves rather
predictable latencies (especially when you don't overcommit the OS by issueing
more computational threads than you have CPU cores) while preserving throughput
performance.

> Side-note: PCBuild requires nasmw.exe but it no longer exists in the latest
> version. I had to rename nasm.exe to nasmw.exe. Would be nice to add this to
> the readme to avoid confusion...

You should file a bug on http://bugs.python.org

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-07 Thread Baptiste Lepilleur
Hi Antoine,
I was finally able to compile py3k and run the benchmark (my compilation
issue was caused by checking out on Windows and compiling on Unix. Some
Makefile templates are missing correct EOL properties in SVN I think).

The benchmark results can be obtained from:
http://gaiacrtn.free.fr/py/benchmark-newgil/benchmark-newgil.tar.bz2
and viewed from:
http://gaiacrtn.free.fr/py/benchmark-newgil/

I ran the benchmark on two platforms:

   - Solaris X86, 16 cores: some python extension are likely missing (see
   config.log)
   - Windows XP SP3, 4 cores: all python extensions but TCL (I didn't bother
   checking why it failed as it is not used in the benchmark). It is a release
   build.

The results look promising but I let you share your conclusion (some latency
results seem a bit strange from my understanding).

Side-note: PCBuild requires nasmw.exe but it no longer exists in the latest
version. I had to rename nasm.exe to nasmw.exe. Would be nice to add this to
the readme to avoid confusion...

Baptiste.

2009/11/1 Antoine Pitrou 

>
> Hello again,
>
> Brett Cannon  python.org> writes:
> >
> > I think it's worth it. Removal of the GIL is a totally open-ended problem
> > with no solution in sight. This, on the other hand, is a performance
> benefit
> > now. I say move forward with this. If it happens to be short-lived
> because
> > some actually figures out how to remove the GIL then great, but is that
> > really going to happen between now and Python 3.2? I doubt it.
>
> Based on this whole discussion, I think I am going to merge the new GIL
> work
> into the py3k branch, with priority requests disabled.
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Sturla Molden

Martin v. Löwis skrev:

I did, and it does nothing of what I suggested. I am sure I can make the
Windows GIL in ceval_gil.h and the mutex in thread_nt.h at lot more precise and
efficient.



Hmm. I'm skeptical that your code makes it more accurate, and I
completely fail to see that it makes it more efficient (by what
measurement of efficiency?)

Also, why would making it more accurate make it better? IIUC,
accuracy is completely irrelevant here, though efficiency
(low overhead) does matter.

  

This is the kind of code I was talking about, from ceval_gil.h:

r = WaitForMultipleObjects(2, objects, TRUE, milliseconds);

I would turn on multimedia timer (it is not on by default), and replace
this
call with a loop, approximately like this:

for (;;) {
  r = WaitForMultipleObjects(2, objects, TRUE, 0);
  /* blah blah blah */ QueryPerformanceCounter(&cnt);if (cnt >
timeout) break;
  Sleep(0);
}

And the timeout "milliseconds" would now be computed from querying the
performance counter, instead of unreliably by the Windows NT kernel.



Hmm. This creates a busy wait loop; if you add larger sleep values,
then it loses accuracy.

  
Actually an usleep lookes like this, and the call to the wait function 
must go into the for loop. But no, it's not a busy sleep.


static int inited = 0;
static __int64 hz;
static double dhz;
const double sleep_granularity = 2.0E10-3;

void usleep( long us )
{
   __int64 cnt, end;
   double diff;
   if (!inited) {
   timeBeginPeriod(1);
   QueryPerformanceFrequency((LARGE_INTEGER*)&hz);
   dhz = (double)hz;
   inited = 1;
   }  
   QueryPerformanceCounter((LARGE_INTEGER*)&cnt);

   end = cnt + (__int64)(1.0E10-6 * (double)(us) * dhz);
   for (;;) {
   QueryPerformanceCounter((LARGE_INTEGER*)&cnt);
   if (cnt >= end) break;
   diff = (double)(end - cnt)/dhz;
   if (diff > sleep_granularity)
   Sleep((DWORD)(diff - sleep_granularity));
   else
   Sleep(0);  
   }

}



Why not just call timeBeginPeriod, and then rely on the higher clock
rate for WaitForMultipleObjects?

  

That is what I suggested when Antoine said 1-2 ms was enough.



Sturla




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Jeffrey Yasskin
On Mon, Nov 2, 2009 at 4:15 AM, Antoine Pitrou  wrote:
> Martin v. Löwis  v.loewis.de> writes:
>>
> [gil_drop_request]
>> Even if it is read from memory, I still wonder what might happen on
>> systems that require explicit memory barriers to synchronize across
>> CPUs. What if CPU 1 keeps reading a 0 value out of its cache, even
>> though CPU 1 has written an 1 value a long time ago?
>>
>> IIUC, any (most?) pthread calls would cause synchronization in that
>> case, which is why applications that also use locks for reading:
>>
>> http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap04.html#tag_04_10
>>
>> Of course, on x86, you won't see any issues, because it's cache-coherent
>> anyway.
>
> I think there are two things here:
> - all machines Python runs on should AFAIK be cache-coherent: CPUs synchronize
> their views of memory in a rather timely fashion.
> - memory ordering: writes made by a CPU can be seen in a different order by
> another CPU (i.e. CPU 1 writes A before B, but CPU 2 sees B written before 
> A). I
> don't see how this can apply to gil_drop_request, since it's a single variable
> (and, moreover, only a single bit of it is significant).
>
> (there's an explanation of memory ordering issues here:
> http://www.linuxjournal.com/article/8211)
>
> As a side note, I remember Jeffrey Yasskin trying to specify an ordering model
> for Python code
> (see http://code.google.com/p/unladen-swallow/wiki/MemoryModel).

Note that that memory model was only for Python code; the C code
implementing it is subject to the C memory model, which is weaker (and
not even fully defined until the next C standard comes out).

To be really safe, we ought to have a couple primitives implementing
"atomic" rather than just "volatile" instructions, but until then a
signal that's just saying "do something" rather than "here's some data
you should look at" should be ok as a volatile int.

I'd like to look at the patch in detail, but I can't guarantee that
I'll get to it in a timely manner. I'd say check it in and let more
threading experts look at it in the tree. We've got some time before a
release for people to fix problems and make further improvements. +1
to Martin's request for detailed documentation though. :)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Daniel Stutzbach
On Mon, Nov 2, 2009 at 11:27 AM, "Martin v. Löwis" wrote:

> Hmm. This creates a busy wait loop; if you add larger sleep values,
> then it loses accuracy.
>

I thought that at first, too, but then I checked the documentation for
Sleep(0):

"A value of zero causes the thread to relinquish the remainder of its time
slice to any other thread of equal priority that is ready to run."

(this is not to say that I think the solution with Sleep is worthwhile,
though...)

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Martin v. Löwis
> I did, and it does nothing of what I suggested. I am sure I can make the
> Windows GIL in ceval_gil.h and the mutex in thread_nt.h at lot more precise 
> and
> efficient.

Hmm. I'm skeptical that your code makes it more accurate, and I
completely fail to see that it makes it more efficient (by what
measurement of efficiency?)

Also, why would making it more accurate make it better? IIUC,
accuracy is completely irrelevant here, though efficiency
(low overhead) does matter.

> This is the kind of code I was talking about, from ceval_gil.h:
> 
> r = WaitForMultipleObjects(2, objects, TRUE, milliseconds);
> 
> I would turn on multimedia timer (it is not on by default), and replace
> this
> call with a loop, approximately like this:
> 
> for (;;) {
>   r = WaitForMultipleObjects(2, objects, TRUE, 0);
>   /* blah blah blah */ QueryPerformanceCounter(&cnt);if (cnt >
> timeout) break;
>   Sleep(0);
> }
> 
> And the timeout "milliseconds" would now be computed from querying the
> performance counter, instead of unreliably by the Windows NT kernel.

Hmm. This creates a busy wait loop; if you add larger sleep values,
then it loses accuracy.

Why not just call timeBeginPeriod, and then rely on the higher clock
rate for WaitForMultipleObjects?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Antoine Pitrou
Sturla Molden  molden.no> writes:
> 
> I'd love to try, but I don't have VC++ to build Python, I use GCC on 
> Windows.

You can use Visual Studio Express, which is free (gratis).



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread John Arbash Meinel
Sturla Molden wrote:
> Antoine Pitrou skrev:
>> It certainly is.
>> But once again, I'm no Windows developer and I don't have a native
>> Windost host
>> to test on; therefore someone else (you?) has to try.
>>   
> I'd love to try, but I don't have VC++ to build Python, I use GCC on
> Windows.
> 
> Anyway, the first thing to try then is to call
>  
>   timeBeginPeriod(1);
> 
> once on startup, and leave the rest of the code as it is. If 2-4 ms is
> sufficient we can use timeBeginPeriod(2), etc. Microsoft is claiming
> Windows performs better with high granularity, which is why it is 10 ms
> by default.
> 
> 
> Sturla

That page claims:
  Windows uses the lowest value (that is, highest resolution) requested
  by any process.

I would posit that the chance of having some random process on your
machine request a high-speed timer is high enough that the overhead for
Python doing the same is probably low.

John
=:->
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Sturla Molden

Antoine Pitrou skrev:

It certainly is.
But once again, I'm no Windows developer and I don't have a native Windost host
to test on; therefore someone else (you?) has to try.
  
I'd love to try, but I don't have VC++ to build Python, I use GCC on 
Windows.


Anyway, the first thing to try then is to call
 
  timeBeginPeriod(1);


once on startup, and leave the rest of the code as it is. If 2-4 ms is 
sufficient we can use timeBeginPeriod(2), etc. Microsoft is claiming 
Windows performs better with high granularity, which is why it is 10 ms 
by default.



Sturla






___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Antoine Pitrou
Sturla Molden  molden.no> writes:
> 
> By the way Antoine, if you think granularity of 1-2 ms is sufficient, 

It certainly is.
But once again, I'm no Windows developer and I don't have a native Windost host
to test on; therefore someone else (you?) has to try.

Also, the MSDN doc (*) says timeBeginPeriod() can have a detrimental effect on
system performance; I don't know how much of it is true.

(*) http://msdn.microsoft.com/en-us/library/dd757624(VS.85).aspx


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Sturla Molden

Sturla Molden skrev:


And just so you don't ask: There should not just be a Sleep(0) in the 
loop, but a sleep that gets shorter and shorter until a lower 
threshold is reached, where it skips to Sleep(0). That way we avoid 
hammering om WaitForMultipleObjects and QueryPerformanceCounter more 
than we need. And for all that to work better than just giving a 
timeout to WaitForMultipleObjects, we need the multimedia timer turned 
on.


The important thing about multimedia timer is that the granularity of 
Sleep() and WaitForMultipleObjects() by default is "10 ms or at most 20 
ms". But if we call


timeBeginPeriod(1);

the MM timer is on and granularity becomes 1 ms or at most 2 ms. But we 
can get even more precise than that by hammering on Sleep(0) for 
timeouts less than 2 ms. We can get typical granularity in the order of 
10 µs, with the occational 100 µs now and then. I know this because I 
was using Windows 2000 to generate TTL signals on the LPT port some 
years ago, and watched them on the oscilloscope.


~ 15 ms granularity is Windows default. But that is brain dead.

By the way Antoine, if you think granularity of 1-2 ms is sufficient, 
i.e. no need for µs precision, then just calling timeBeginPeriod(1) will 
be sufficient.


Sturla








___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Sturla Molden

Sturla Molden skrev:


I would turn on multimedia timer (it is not on by default), and 
replace this

call with a loop, approximately like this:

for (;;) {
  r = WaitForMultipleObjects(2, objects, TRUE, 0);
  /* blah blah blah */ QueryPerformanceCounter(&cnt);if (cnt > 
timeout) break;

  Sleep(0);
}


And just so you don't ask: There should not just be a Sleep(0) in the 
loop, but a sleep that gets shorter and shorter until a lower threshold 
is reached, where it skips to Sleep(0). That way we avoid hammering om 
WaitForMultipleObjects and QueryPerformanceCounter more than we need. 
And for all that to work better than just giving a timeout to 
WaitForMultipleObjects, we need the multimedia timer turned on.


Sturla



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Antoine Pitrou
Sturla Molden  molden.no> writes:
> 
> And the timeout "milliseconds" would now be computed from querying the 
> performance
> counter, instead of unreliably by the Windows NT kernel.

Could you actually test your proposal under Windows and report what kind of
concrete benefits it brings?

Thank you

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Sturla Molden

Martin v. Löwis skrev:

Maybe you should study the code under discussion before making such
a proposal.

I did, and it does nothing of what I suggested. I am sure I can make the 
Windows GIL
in ceval_gil.h and the mutex in thread_nt.h at lot more precise and 
efficient.


This is the kind of code I was talking about, from ceval_gil.h:

r = WaitForMultipleObjects(2, objects, TRUE, milliseconds);

I would turn on multimedia timer (it is not on by default), and replace 
this

call with a loop, approximately like this:

for (;;) {
  r = WaitForMultipleObjects(2, objects, TRUE, 0);
  /* blah blah blah */   
  QueryPerformanceCounter(&cnt);  
  if (cnt > timeout) break;

  Sleep(0);
}

And the timeout "milliseconds" would now be computed from querying the 
performance

counter, instead of unreliably by the Windows NT kernel.



Sturla










___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Martin v. Löwis
> - all machines Python runs on should AFAIK be cache-coherent: CPUs synchronize
> their views of memory in a rather timely fashion.

Ok. I thought that Itanium was an example where this assumption is
actually violated (as many web pages claim such a restriction), however,
it seems that on Itanium, caches are indeed synchronized using MESI.

So claims wrt. lack of cache consistency on Itanium, and the need for
barrier instruction, seem to be caused by the Itanium feature that
allows the processor to fetch memory out-of-order, i.e. an earlier read
may see a later memory state. This is apparently used to satisfy reads
as soon as the cache line is read (so that the cache line can be
discarded earlier). Wrt. to your requirement ("rather timely fashion",
this still seems to be fine).

Still, this all needs to be documented in the code :-)

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Antoine Pitrou
Martin v. Löwis  v.loewis.de> writes:
> 
[gil_drop_request]
> Even if it is read from memory, I still wonder what might happen on
> systems that require explicit memory barriers to synchronize across
> CPUs. What if CPU 1 keeps reading a 0 value out of its cache, even
> though CPU 1 has written an 1 value a long time ago?
> 
> IIUC, any (most?) pthread calls would cause synchronization in that
> case, which is why applications that also use locks for reading:
> 
> http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap04.html#tag_04_10
> 
> Of course, on x86, you won't see any issues, because it's cache-coherent
> anyway.

I think there are two things here:
- all machines Python runs on should AFAIK be cache-coherent: CPUs synchronize
their views of memory in a rather timely fashion.
- memory ordering: writes made by a CPU can be seen in a different order by
another CPU (i.e. CPU 1 writes A before B, but CPU 2 sees B written before A). I
don't see how this can apply to gil_drop_request, since it's a single variable
(and, moreover, only a single bit of it is significant).

(there's an explanation of memory ordering issues here:
http://www.linuxjournal.com/article/8211)

As a side note, I remember Jeffrey Yasskin trying to specify an ordering model
for Python code
(see http://code.google.com/p/unladen-swallow/wiki/MemoryModel).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Martin v. Löwis
>> c) is the gil_drop_request handling thread-safe? If your answer is
>>"yes", you should add a comment as to what the assumptions are of
>>this code (ISTM that multiple threads may simultaneously attempt
>>to set the drop request, overlapping with the holding thread actually
>>dropping the GIL).
> 
> The gil_drop_request is volatile mainly out of precaution, but it's probably 
> not
> necessary. It is only written (set/cleared) when holding the gil_mutex; 
> however,
> it can be read while not holding that mutex. Exceptionally reading the "wrong"
> value would not have any adverse consequences, it would just decrease the
> switching quality at the said instant.

I think it then needs definitely to be volatile - otherwise, the
compiler may chose to cache it in a register (assuming enough registers
are available), instead of re-reading it from memory each time (which is
exactly what volatile then forces it to).

Even if it is read from memory, I still wonder what might happen on
systems that require explicit memory barriers to synchronize across
CPUs. What if CPU 1 keeps reading a 0 value out of its cache, even
though CPU 1 has written an 1 value a long time ago?

IIUC, any (most?) pthread calls would cause synchronization in that
case, which is why applications that also use locks for reading:

http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap04.html#tag_04_10

Of course, on x86, you won't see any issues, because it's cache-coherent
anyway.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Antoine Pitrou
Martin v. Löwis  v.loewis.de> writes:
> 
> I've looked at this part of the implementation, and have a few comments.
> a) why is gil_interval a double? Make it an int, counting in
>microseconds.

Ok, I'll do that.

> b) notice that, on Windows, minimum wait resolution may be as large as
>15ms (e.g. on XP, depending on the hardware). Not sure what this
>means for WaitForMultipleObjects; most likely, if you ask for a 5ms
>wait, it waits until the next clock tick. It would be bad if, on
>some systems, a wait of 5ms would mean that it immediately returns.

I'll let someone else give an authoritative answer. I did the Windows version
mainly by reading online MSDN docs, and testing it worked fine in an XP VM.

Anyway, here is what the online docs have to say :

« The system clock "ticks" at a constant rate. If the time-out interval is less
than the resolution of the system clock, the wait may time out in less than the
specified length of time. If the time-out interval is greater than one tick but
less than two, the wait can be anywhere between one and two ticks, and so on. »

So it seems that the timeout always happens on a Windows tick boundary, which
means that it can immediately return, but only very rarely so and never more
than once in a row...

> c) is the gil_drop_request handling thread-safe? If your answer is
>"yes", you should add a comment as to what the assumptions are of
>this code (ISTM that multiple threads may simultaneously attempt
>to set the drop request, overlapping with the holding thread actually
>dropping the GIL).

The gil_drop_request is volatile mainly out of precaution, but it's probably not
necessary. It is only written (set/cleared) when holding the gil_mutex; however,
it can be read while not holding that mutex. Exceptionally reading the "wrong"
value would not have any adverse consequences, it would just decrease the
switching quality at the said instant.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Martin v. Löwis
Sturla Molden wrote:
> Martin v. Löwis skrev:
>> b) notice that, on Windows, minimum wait resolution may be as large as
>>15ms (e.g. on XP, depending on the hardware). Not sure what this
>>means for WaitForMultipleObjects; most likely, if you ask for a 5ms
>>wait, it waits until the next clock tick. It would be bad if, on
>>some systems, a wait of 5ms would mean that it immediately returns.
>>   
> Which is why one should use multimedia timers with QPC on Windows.

Maybe you should study the code under discussion before making such
a proposal.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Sturla Molden

Martin v. Löwis skrev:

b) notice that, on Windows, minimum wait resolution may be as large as
   15ms (e.g. on XP, depending on the hardware). Not sure what this
   means for WaitForMultipleObjects; most likely, if you ask for a 5ms
   wait, it waits until the next clock tick. It would be bad if, on
   some systems, a wait of 5ms would mean that it immediately returns.
  

Which is why one should use multimedia timers with QPC on Windows.

To get a wait function with much better resolution than Windows' 
default, do this:


1. Set a high resolution with timeBeginPeriod.

2. Loop using a time-out of 0 for WaitForMultipleObjects and put a 
Sleep(0) in the loop not to burn the CPU. Call QPF to get a precise 
timing, and break the loop when the requested time-out has been reached.


3. When you are done, call timeBeginPeriod to turn the multimedia timer off.

This is how you create usleep() in Windows as well: Just loop on QPF and 
Sleep(0) after setting timeBeginPeriod(1).







___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-02 Thread Martin v. Löwis
> The new GIL does away with this by ditching _Py_Ticker entirely and
> instead using a fixed interval (by default 5 milliseconds, but settable)
> after which we ask the main thread to release the GIL and let another
> thread be scheduled.

I've looked at this part of the implementation, and have a few comments.
a) why is gil_interval a double? Make it an int, counting in
   microseconds.
b) notice that, on Windows, minimum wait resolution may be as large as
   15ms (e.g. on XP, depending on the hardware). Not sure what this
   means for WaitForMultipleObjects; most likely, if you ask for a 5ms
   wait, it waits until the next clock tick. It would be bad if, on
   some systems, a wait of 5ms would mean that it immediately returns.
c) is the gil_drop_request handling thread-safe? If your answer is
   "yes", you should add a comment as to what the assumptions are of
   this code (ISTM that multiple threads may simultaneously attempt
   to set the drop request, overlapping with the holding thread actually
   dropping the GIL). There is also the question whether it is
   thread-safe to write into a "volatile int"; I keep forgetting the
   answer.

Regards,
Martin


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-01 Thread Gregory P. Smith
On Sun, Nov 1, 2009 at 3:33 AM, Antoine Pitrou  wrote:
>
> Hello again,
>
> Brett Cannon  python.org> writes:
>>
>> I think it's worth it. Removal of the GIL is a totally open-ended problem
>> with no solution in sight. This, on the other hand, is a performance benefit
>> now. I say move forward with this. If it happens to be short-lived because
>> some actually figures out how to remove the GIL then great, but is that
>> really going to happen between now and Python 3.2? I doubt it.
>
> Based on this whole discussion, I think I am going to merge the new GIL work
> into the py3k branch, with priority requests disabled.
>
> If you think this is premature or uncalled for, or if you just want to review
> the changes before making a judgement, please voice up :)

+1 Good idea.  Thats the best way to make sure this work gets
anywhere.  It can be iterated on from there if anyone has objections.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-01 Thread Christian Heimes
Antoine Pitrou wrote:
> Christian Heimes  cheimes.de> writes:
>> +1 from me. I trust you like Brett does.
>>
>> How much work would it cost to make your patch optional at compile time?
> 
> Quite a bit, because it changes the logic for processing asynchronous pending
> calls (signals) and asynchronous exceptions in the eval loop. The #defines 
> would
> get quite convoluted, I think; I'd prefer not to do that.

Based on the new piece of information I totally agree.

> I don't really think so. The GIL is taken and released much more predictably
> than it was before. The thing that might be worth checking is a workload with
> many threads (say 50 or 100). Does anyone have that?

I don't have an application that works on Python 3 and uses that many
threads, sorry.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-01 Thread Antoine Pitrou
Christian Heimes  cheimes.de> writes:
> 
> +1 from me. I trust you like Brett does.
> 
> How much work would it cost to make your patch optional at compile time?

Quite a bit, because it changes the logic for processing asynchronous pending
calls (signals) and asynchronous exceptions in the eval loop. The #defines would
get quite convoluted, I think; I'd prefer not to do that.

> For what it's worth we could compare your work on different machines and
> on different platforms before it gets enabled by default. Can you
> imagine scenarios where your implementation might be slower than the
> current GIL implementation?

I don't really think so. The GIL is taken and released much more predictably
than it was before. The thing that might be worth checking is a workload with
many threads (say 50 or 100). Does anyone have that?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-01 Thread Christian Heimes
Antoine Pitrou wrote:
> Based on this whole discussion, I think I am going to merge the new GIL work
> into the py3k branch, with priority requests disabled.
> 
> If you think this is premature or uncalled for, or if you just want to review
> the changes before making a judgement, please voice up :)

+1 from me. I trust you like Brett does.

How much work would it cost to make your patch optional at compile time?
For what it's worth we could compare your work on different machines and
on different platforms before it gets enabled by default. Can you
imagine scenarios where your implementation might be slower than the
current GIL implementation?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-01 Thread Brett Cannon
On Sun, Nov 1, 2009 at 03:33, Antoine Pitrou  wrote:
>
> Hello again,
>
> Brett Cannon  python.org> writes:
>>
>> I think it's worth it. Removal of the GIL is a totally open-ended problem
>> with no solution in sight. This, on the other hand, is a performance benefit
>> now. I say move forward with this. If it happens to be short-lived because
>> some actually figures out how to remove the GIL then great, but is that
>> really going to happen between now and Python 3.2? I doubt it.
>
> Based on this whole discussion, I think I am going to merge the new GIL work
> into the py3k branch, with priority requests disabled.

This will be a nice Py3K carrot!

>
> If you think this is premature or uncalled for, or if you just want to review
> the changes before making a judgement, please voice up :)

I know I personally trust you to not mess it up, Antoine, but that
might also come from mental exhaustion and laziness. =)

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-11-01 Thread Antoine Pitrou

Hello again,

Brett Cannon  python.org> writes:
> 
> I think it's worth it. Removal of the GIL is a totally open-ended problem 
> with no solution in sight. This, on the other hand, is a performance benefit 
> now. I say move forward with this. If it happens to be short-lived because 
> some actually figures out how to remove the GIL then great, but is that 
> really going to happen between now and Python 3.2? I doubt it.

Based on this whole discussion, I think I am going to merge the new GIL work
into the py3k branch, with priority requests disabled.

If you think this is premature or uncalled for, or if you just want to review
the changes before making a judgement, please voice up :)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-28 Thread Antoine Pitrou
Kristján Valur Jónsson  ccpgames.com> writes:
> 
> In my experience (from stackless python) using priority wakeup for IO can
result in very erratic
> scheduling when there is much IO going on, every IO trumping another.

I whipped up a trivial multithreaded HTTP server using
socketserver.ThreadingMixin and wsgiref, and used apachebench against it with a
reasonable concurrency level (10 requests at once). Enabling/disabling priority
requests doesn't seem to make a difference.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread Cameron Simpson
On 26Oct2009 22:45, exar...@twistedmatrix.com  wrote:
| On 04:18 pm, dan...@stutzbachenterprises.com wrote:
| >On Mon, Oct 26, 2009 at 10:58 AM, Antoine Pitrou write:
| >Do we really need priority requests at all?  They seem counter to your
| >desire for simplicity and allowing the operating system's
| >scheduler to do
| >its work.
| 
| Despite what I said above, however, I would also take a default
| position against adding any kind of more advanced scheduling system
| here.  It would, perhaps, make sense to expose the APIs for
| controlling the platform scheduler, though.

+1 to both sentences from me.
-- 
Cameron Simpson  DoD#743
http://www.cskk.ezoshosting.com/cs/

Plague, Famine, Pestilence, and C++ stalk the land. We're doomed! Doomed!
- Simon E Spero
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread sstein...@gmail.com


On Oct 26, 2009, at 6:45 PM, exar...@twistedmatrix.com wrote:
Despite what I said above, however, I would also take a default  
position against adding any kind of more advanced scheduling system  
here.  It would, perhaps, make sense to expose the APIs for  
controlling the platform scheduler, though.


I would also like to have an exposed API and optional profiling  
(optional as in conditional compilation, not as in some sort of  
profiling 'flag' that slows down non-profiling runs) so that you can  
see what's going on well enough to use the API to tune a particular  
platform for a particular workload.


S


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread exarkun

On 04:18 pm, dan...@stutzbachenterprises.com wrote:
On Mon, Oct 26, 2009 at 10:58 AM, Antoine Pitrou 
wrote:

Er, I prefer to keep things simple. If you have lots of I/O you should
probably
use an event loop rather than separate threads.


On Windows, sometimes using a single-threaded event loop is sometimes
impossible.  WaitForMultipleObjects(), which is the Windows equivalent 
to

select() or poll(), can handle a maximum of only 64 objects.


This is only partially accurate.  For one thing, WaitForMultipleObjects 
calls are nestable.  For another thing, Windows also has I/O completion 
ports which are not limited to 64 event sources.  The situation is 
actually better than on a lot of POSIXes.

Do we really need priority requests at all?  They seem counter to your
desire for simplicity and allowing the operating system's scheduler to 
do

its work.


Despite what I said above, however, I would also take a default position 
against adding any kind of more advanced scheduling system here.  It 
would, perhaps, make sense to expose the APIs for controlling the 
platform scheduler, though.


Jean-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread Collin Winter
On Mon, Oct 26, 2009 at 2:43 PM, Antoine Pitrou  wrote:
> Collin Winter  gmail.com> writes:
> [the Dave Beazley benchmark]
>> The results below are benchmarking py3k as the control, newgil as the
>> experiment. When it says "x% faster", that is a measure of newgil's
>> performance over py3k's.
>>
>> With two threads:
>>
>> iterative_count:
>> Min: 0.336573 -> 0.387782: 13.21% slower  # I've run this
>> configuration multiple times and gotten the same slowdown.
>> Avg: 0.338473 -> 0.418559: 19.13% slower
>
> Those numbers are not very in line with the other "iterative_count" results.
> Since iterative_count just runs the loop N times in a row, results should be
> proportional to the number N ("number of threads").
>
> Besides, there's no reason for single-threaded performance to be degraded 
> since
> the fast path of the eval loop actually got a bit streamlined (there is no
> volatile ticker to decrement).

I agree those numbers are out of line with the others and make no
sense. I've run it with two threads several times and the results are
consistent on this machine. I'm digging into it a bit more.

Collin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread Antoine Pitrou
Collin Winter  gmail.com> writes:
> 
> My results for an 2.4 GHz Intel Core 2 Duo MacBook Pro (OS X 10.5.8):

Thanks!


[the Dave Beazley benchmark]
> The results below are benchmarking py3k as the control, newgil as the
> experiment. When it says "x% faster", that is a measure of newgil's
> performance over py3k's.
> 
> With two threads:
> 
> iterative_count:
> Min: 0.336573 -> 0.387782: 13.21% slower  # I've run this
> configuration multiple times and gotten the same slowdown.
> Avg: 0.338473 -> 0.418559: 19.13% slower

Those numbers are not very in line with the other "iterative_count" results.
Since iterative_count just runs the loop N times in a row, results should be
proportional to the number N ("number of threads").

Besides, there's no reason for single-threaded performance to be degraded since
the fast path of the eval loop actually got a bit streamlined (there is no
volatile ticker to decrement).

> I'd be interested in multithreaded benchmarks with less-homogenous workloads.

So would I.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread Collin Winter
On Sun, Oct 25, 2009 at 1:22 PM, Antoine Pitrou  wrote:
> Having other people test it would be fine. Even better if you have an
> actual multi-threaded py3k application. But ccbench results for other
> OSes would be nice too :-)

My results for an 2.4 GHz Intel Core 2 Duo MacBook Pro (OS X 10.5.8):

Control (py3k @ r75723)

--- Throughput ---

Pi calculation (Python)

threads=1: 633 iterations/s.
threads=2: 468 ( 74 %)
threads=3: 443 ( 70 %)
threads=4: 442 ( 69 %)

regular expression (C)

threads=1: 281 iterations/s.
threads=2: 282 ( 100 %)
threads=3: 282 ( 100 %)
threads=4: 282 ( 100 %)

bz2 compression (C)

threads=1: 379 iterations/s.
threads=2: 735 ( 193 %)
threads=3: 733 ( 193 %)
threads=4: 724 ( 190 %)

--- Latency ---

Background CPU task: Pi calculation (Python)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 1 ms. (std dev: 1 ms.)
CPU threads=2: 1 ms. (std dev: 2 ms.)
CPU threads=3: 3 ms. (std dev: 6 ms.)
CPU threads=4: 2 ms. (std dev: 3 ms.)

Background CPU task: regular expression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 975 ms. (std dev: 577 ms.)
CPU threads=2: 1035 ms. (std dev: 571 ms.)
CPU threads=3: 1098 ms. (std dev: 556 ms.)
CPU threads=4: 1195 ms. (std dev: 557 ms.)

Background CPU task: bz2 compression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 2 ms.)
CPU threads=2: 4 ms. (std dev: 5 ms.)
CPU threads=3: 0 ms. (std dev: 0 ms.)
CPU threads=4: 1 ms. (std dev: 4 ms.)



Experiment (newgil branch @ r75723)

--- Throughput ---

Pi calculation (Python)

threads=1: 651 iterations/s.
threads=2: 643 ( 98 %)
threads=3: 637 ( 97 %)
threads=4: 625 ( 95 %)

regular expression (C)

threads=1: 298 iterations/s.
threads=2: 296 ( 99 %)
threads=3: 288 ( 96 %)
threads=4: 287 ( 96 %)

bz2 compression (C)

threads=1: 378 iterations/s.
threads=2: 720 ( 190 %)
threads=3: 724 ( 191 %)
threads=4: 718 ( 189 %)

--- Latency ---

Background CPU task: Pi calculation (Python)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 1 ms.)
CPU threads=2: 0 ms. (std dev: 1 ms.)
CPU threads=3: 0 ms. (std dev: 0 ms.)
CPU threads=4: 1 ms. (std dev: 5 ms.)

Background CPU task: regular expression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 1 ms. (std dev: 0 ms.)
CPU threads=2: 2 ms. (std dev: 1 ms.)
CPU threads=3: 2 ms. (std dev: 2 ms.)
CPU threads=4: 2 ms. (std dev: 1 ms.)

Background CPU task: bz2 compression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 0 ms.)
CPU threads=2: 2 ms. (std dev: 3 ms.)
CPU threads=3: 0 ms. (std dev: 1 ms.)
CPU threads=4: 0 ms. (std dev: 0 ms.)


I also ran this through Unladen Swallow's threading microbenchmark,
which is a straight copy of what David Beazley was experimenting with
(simply iterating over 100 ints in pure Python) [1].
"iterative_count" is doing the loops one after the other,
"threaded_count" is doing the loops in parallel using threads.

The results below are benchmarking py3k as the control, newgil as the
experiment. When it says "x% faster", that is a measure of newgil's
performance over py3k's.

With two threads:

iterative_count:
Min: 0.336573 -> 0.387782: 13.21% slower  # I've run this
configuration multiple times and gotten the same slowdown.
Avg: 0.338473 -> 0.418559: 19.13% slower
Significant (t=-38.434785, a=0.95)

threaded_count:
Min: 0.529859 -> 0.397134: 33.42% faster
Avg: 0.581786 -> 0.429933: 35.32% faster
Significant (t=70.100445, a=0.95)


With four threads:

iterative_count:
Min: 0.766617 -> 0.734354: 4.39% faster
Avg: 0.771954 -> 0.751374: 2.74% faster
Significant (t=22.164103, a=0.95)
Stddev: 0.00262 -> 0.00891: 70.53% larger

threaded_count:
Min: 1.175750 -> 0.829181: 41.80% faster
Avg: 1.224157 -> 0.867506: 41.11% faster
Significant (t=161.715477, a=0.95)
Stddev: 0.01900 -> 0.01120: 69.65% smaller


With eight threads:

iterative_count:
Min: 1.527794 -> 1.447421: 5.55% faster
Avg: 1.536911 -> 1.479940: 3.85% faster
Significant (t=35.559595, a=0.95)
Stddev: 0.00394 -> 0.01553: 74.61% larger

threaded_count:
Min: 2.424553 -> 1.677180: 44.56% faster
Avg: 2.484922 -> 1.723093: 44.21% faster
Significant (t=184.766131, a=0.95)
Stddev: 0.02874 -> 0.02956: 2.78% larger


I'd be interested in multithreaded benchmarks with less-homogenous workloads.

Collin Winter

[1] - 
http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_threading.py
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread Antoine Pitrou
Daniel Stutzbach  stutzbachenterprises.com> writes:
> 
> Do we really need priority requests at all?  They seem counter to your
> desire for simplicity and allowing the operating system's scheduler to do
> its work.

No, they can be disabled (removed) if we prefer. With priority requests
disabled, latency results becomes less excellent but still quite good.

Running ccbench on a dual core machine gives the following latency results,
first with then without priority requets.

--- Latency --- (with prio requests)

Background CPU task: Pi calculation (Python)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 2 ms.)
CPU threads=2: 0 ms. (std dev: 2 ms.)
CPU threads=3: 0 ms. (std dev: 2 ms.)
CPU threads=4: 0 ms. (std dev: 2 ms.)

Background CPU task: regular expression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 3 ms. (std dev: 2 ms.)
CPU threads=2: 3 ms. (std dev: 2 ms.)
CPU threads=3: 3 ms. (std dev: 2 ms.)
CPU threads=4: 4 ms. (std dev: 3 ms.)

Background CPU task: bz2 compression (C)

CPU threads=0: 0 ms. (std dev: 2 ms.)
CPU threads=1: 0 ms. (std dev: 2 ms.)
CPU threads=2: 0 ms. (std dev: 0 ms.)
CPU threads=3: 0 ms. (std dev: 2 ms.)
CPU threads=4: 0 ms. (std dev: 1 ms.)

--- Latency --- (without prio requests)

Background CPU task: Pi calculation (Python)

CPU threads=0: 0 ms. (std dev: 2 ms.)
CPU threads=1: 5 ms. (std dev: 0 ms.)
CPU threads=2: 3 ms. (std dev: 3 ms.)
CPU threads=3: 9 ms. (std dev: 7 ms.)
CPU threads=4: 22 ms. (std dev: 23 ms.)

Background CPU task: regular expression (C)

CPU threads=0: 0 ms. (std dev: 1 ms.)
CPU threads=1: 8 ms. (std dev: 2 ms.)
CPU threads=2: 5 ms. (std dev: 4 ms.)
CPU threads=3: 21 ms. (std dev: 32 ms.)
CPU threads=4: 19 ms. (std dev: 26 ms.)

Background CPU task: bz2 compression (C)

CPU threads=0: 0 ms. (std dev: 1 ms.)
CPU threads=1: 0 ms. (std dev: 2 ms.)
CPU threads=2: 0 ms. (std dev: 0 ms.)
CPU threads=3: 0 ms. (std dev: 0 ms.)
CPU threads=4: 0 ms. (std dev: 0 ms.)



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread Daniel Stutzbach
On Mon, Oct 26, 2009 at 10:58 AM, Antoine Pitrou wrote:

> Er, I prefer to keep things simple. If you have lots of I/O you should
> probably
> use an event loop rather than separate threads.
>

On Windows, sometimes using a single-threaded event loop is sometimes
impossible.  WaitForMultipleObjects(), which is the Windows equivalent to
select() or poll(), can handle a maximum of only 64 objects.

Do we really need priority requests at all?  They seem counter to your
desire for simplicity and allowing the operating system's scheduler to do
its work.

That said, if a thread's time budget is merely paused during I/O rather than
reset, then a thread making frequent (but short) I/O requests cannot starve
the system.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread Antoine Pitrou
Sturla Molden  molden.no> writes:
> 
> Antoine Pitrou skrev:
> > - priority requests, which is an option for a thread requesting the GIL
> > to be scheduled as soon as possible, and forcibly (rather than any other
> > threads). T
> Should a priority request for the GIL take a priority number?

Er, I prefer to keep things simple. If you have lots of I/O you should probably
use an event loop rather than separate threads.

> Related issue: Should Python threads have priorities? They are after all 
> real OS threads.

Well, precisely they are OS threads, and the OS already assigns them (static or
dynamic) priorities. No need to replicate this.

(to answer another notion expressed in another message, there's no "round-robin"
scheduling either)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread Sturla Molden

Antoine Pitrou skrev:

- priority requests, which is an option for a thread requesting the GIL
to be scheduled as soon as possible, and forcibly (rather than any other
threads). T

Should a priority request for the GIL take a priority number?

- If two threads make a priority requests for the GIL, the one with the 
higher priority should get the GIL first. 

- If a thread with a low priority make a priority request for the GIL, 
it should not be allowed to "preempt" (take the GIL away from) a 
higher-priority thread, in which case the priority request would be 
ignored. 

Related issue: Should Python threads have priorities? They are after all 
real OS threads.


S.M.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread sstein...@gmail.com


On Oct 26, 2009, at 10:09 AM, Kristján Valur Jónsson wrote:





-Original Message-
From: python-dev-bounces+kristjan=ccpgames@python.org
[mailto:python-dev-bounces+kristjan=ccpgames@python.org] On  
Behalf

Of Sturla Molden
time.sleep should generate a priority request to re-acquire the GIL;
and
so should all other blocking standard library functions with a time-
out.


I don't agree.  You have to be very careful with priority.   
time.sleep() does not promise to wake up in any timely manner, and  
neither do the timeout functions.  Rather, the timeout is a way to  
prevent infinite wait.


In my experience (from stackless python) using priority wakeup for  
IO can result in very erratic scheduling when there is much IO going  
on, every IO trumping another.  You should stick to round robin  
except for very special and carefully analysed cases.


All the IO tasks can also go in their own round robin so that CPU time  
is correctly shared among all waiting IO tasks.


IOW, to make sure that all IO tasks get a fair share *in relation to  
all other IO tasks*.


Tasks can be put into the IO round robin when they "pull the IO alarm"  
so to speak, so there's no need to decide before-hand which task goes  
in which round robin pool.


I'm not familiar with this particular code in Python, but I've used  
this in other systems for years to make sure that IO tasks don't  
starve the rest of the system and that the most "insistent" IO task  
doesn't starve all the others.


S

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread Kristján Valur Jónsson


> -Original Message-
> From: python-dev-bounces+kristjan=ccpgames@python.org
> [mailto:python-dev-bounces+kristjan=ccpgames@python.org] On Behalf
> Of Sturla Molden
> time.sleep should generate a priority request to re-acquire the GIL;
> and
> so should all other blocking standard library functions with a time-
> out.

I don't agree.  You have to be very careful with priority.  time.sleep() does 
not promise to wake up in any timely manner, and neither do the timeout 
functions.  Rather, the timeout is a way to prevent infinite wait.

In my experience (from stackless python) using priority wakeup for IO can 
result in very erratic scheduling when there is much IO going on, every IO 
trumping another.  You should stick to round robin except for very special and 
carefully analysed cases.
K
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread Sturla Molden

Antoine Pitrou skrev:

- priority requests, which is an option for a thread requesting the GIL
to be scheduled as soon as possible, and forcibly (rather than any other
threads). 
So Python threads become preemptive rather than cooperative? That would 
be great. :-)


time.sleep should generate a priority request to re-acquire the GIL; and 
so should all other blocking standard library functions with a time-out.



S.M.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread Andrew MacIntyre

Brett Cannon wrote:
It's up to Andrew to get the support in. While I have faith he will, 
this is why we have been scaling back the support for alternative OSs 
for a while and will continue to do so. I suspect the day Andrew stops 
keeping up will be the day we push to have OS/2 be externally maintained.


Notwithstanding my desire to keep OS/2 supported in the Python tree,
keeping up has been more difficult of late:
- OS/2 is unquestionably a "legacy" environment, with system APIs
different in flavour and semantics from the current mainstream (though
surprisingly capable in many ways despite its age).
- The EMX runtime my OS/2 port currently relies on to abstract the 
system API to a Posix-ish API is itself a legacy package, essentially

unmaintained for some years :-(  This has been a source of increasing
pain as Python has moved with the mainstream... with regard to Unicode
support and threads in conjunction with multi-processing, in particular.

Real Life hasn't been favourably disposed either...

I have refrained from applying the extensive patches required to make
the port feature complete for 2.6 and later while I investigate an
alternate Posix emulating runtime (derived from FreeBSD's C library,
and which is used by Mozilla on OS/2), which would allow me to dispense
with most of these patches.  But it has an issue or two of its own...

The cost in effort has been compounded by effectively having to try and
maintain two ports - 2.x and 3.x.  And the 3.x port has suffered more
as its demands are higher.

So while I asked to keep the OS/2 thread support alive, if a decision 
were to be taken to remove OS/2 support from the Python 3.x sources I

could live with that.  A completed migration to Mercurial might well
make future port maintenance easier for me.

Regards,
Andrew.

--
-
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: andy...@bullseye.apana.org.au  (pref) | Snail: PO Box 370
   andy...@pcug.org.au (alt) |Belconnen ACT 2616
Web:http://www.andymac.org/   |Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-26 Thread Antoine Pitrou
Terry Reedy  udel.edu> writes:
> 
> I am curious as to whether the entire mechanism is or can be turned off 
> when not needed -- when there are not threads (other than the main, 
> starting thread)?

It is an implicit feature: when no thread is waiting on the GIL, the GIL-holding
thread isn't notified and doesn't try to release it at all (in the eval loop,
that is; GIL-releasing C extensions still release it).

Note that "no thread is waiting on the GIL" can mean one of two things:
- either there is only one Python thread
- or the other Python threads are doing things with the GIL released (zlib/bz2
compression, waiting on I/O, sleep()ing, etc.)

So, yes, it automatically "turns itself off".

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-25 Thread Terry Reedy

Antoine Pitrou wrote:

Hello there,

The last couple of days I've been working on an experimental rewrite of
the GIL. Since the work has been turning out rather successful (or, at
least, not totally useless and crashing!) I thought I'd announce it
here.


I am curious as to whether the entire mechanism is or can be turned off 
when not needed -- when there are not threads (other than the main, 
starting thread)?


tjr

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reworking the GIL

2009-10-25 Thread Brett Cannon
>
> [SNIP - a lot of detail on what sounds like a good design]
>
> Now what remains to be done?
>
> Having other people test it would be fine. Even better if you have an
> actual multi-threaded py3k application. But ccbench results for other
> OSes would be nice too :-)
> (I get good results under the Windows XP VM but I feel that a VM is not
> an ideal setup for a concurrency benchmark)
>
> Of course, studying and reviewing the code is welcome. As for
> integrating it into the mainline py3k branch, I guess we have to answer
> these questions:
> - is the approach interesting? (we could decide that it's just not worth
> it, and that a good GIL can only be a dead (removed) GIL)
>

I think it's worth it. Removal of the GIL is a totally open-ended problem
with no solution in sight. This, on the other hand, is a performance benefit
now. I say move forward with this. If it happens to be short-lived because
some actually figures out how to remove the GIL then great, but is that
really going to happen between now and Python 3.2? I doubt it.


> - is the patch good, mature and debugged enough?
> - how do we deal with the unsupported platforms (POSIX and Windows
> support should cover most bases, but the fate of OS/2 support depends on
> Andrew)?
>
>
It's up to Andrew to get the support in. While I have faith he will, this is
why we have been scaling back the support for alternative OSs for a while
and will continue to do so. I suspect the day Andrew stops keeping up will
be the day we push to have OS/2 be externally maintained.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com