Re: [Python-Dev] PEP 384: Defining a Stable ABI

2009-05-26 Thread Martin v. Löwis
> Now, with the PEP, I have a feeling that the Python C-API
> will in effect be limited to what's in the PEP's idea of
> a usable ABI and open up the non-inluded public C-APIs
> to the same rate of change as the private APIs.

That's certainly not the plan. Instead, the plan is to have
a stable ABI. The policy on the API isn't affected, except
for restricting changes to the API that would break the ABI.

>> During the compilation of applications, the preprocessor macro
>> Py_LIMITED_API must be defined. Doing so will hide all definitions
>> that are not part of the ABI.
> 
> So extensions wanting to use the full Python C-API as documented
> in the C-API docs will still be able to do this, right ?

Correct. They would link to the version-specific DLL on Windows.

>> The structure of type objects is not available to applications;
>> declaration of "static" type objects is not possible anymore
>> (for applications using this ABI).
> 
> Hmm, that's going to create big problems for extensions that
> want to expose a C-API for their types: Type checks are normally
> done by pointer comparison using those static type objects.

I don't see the problem. During module initialization, you
create the type object and store it in a global variable, and
then both clients and the module compare against the stored
pointer.

>> Function-like macros (in particular, field access macros) remain
>> available to applications, but get replaced by function calls
>> (unless their definition only refers to features of the ABI, such
>> as the various _Check macros)
> 
> Including Py_INCREF()/Py_DECREF() ?

Yes, although some people are requesting that these become functions.

>> Excluded Functions
>> --
>>
>> Functions declared in the following header files are not part
>> of the ABI:
>> - cellobject.h
>> - classobject.h
>> - code.h
>> - frameobject.h
>> - funcobject.h
>> - genobject.h
>> - pyarena.h
>> - pydebug.h
>> - symtable.h
>> - token.h
>> - traceback.h
> 
> I don't think that's feasable: you basically remove all introspection
> functions that way.
> 
> This will need a more fine-grained approach.

What specifically is it that you want to do in a module that you
couldn't do anymore?

>> On Windows, applications shall link with python3.dll;
> 
> You mean: extensions that were compiled with Py_LIMITED_API, right ?

Correct, see "Terminology" in the PEP.

> 
>> an import
>> library python3.lib will be available. This DLL will redirect all of
>> its API functions through /export linker options to the full
>> interpreter DLL, i.e. python3y.dll.
> 
> What if you mix extensions that use the full C-API with ones
> that restrict themselves to the limited version ?

Some link against python3.dll, others against python32.dll (say).

> Would creating a Python object in a full-API extension and
> free'ing it in a limited-API extension cause problems ?

No problem that I can see.

>> This PEP will be implemented in a branch, allowing users to check
>> whether their modules conform to the ABI. To simplify this testing, an
>> additional macro Py_LIMITED_API_WITH_TYPES will expose the existing
>> type object layout, to let users postpone rewriting all types. When
>> the this branch is merged into the 3.2 code base, this macro will
>> be removed.
> 
> Now I'm confused again: this sounds a lot like you do want all extension
> writers to only use the limited API.

I certainly want to support as many modules as reasonable with the PEP.
Whether or not developers then chose to build version-independent
binaries is certainly outside the scope of the PEP - it only specifies
action items for Python, not for application authors.

>>> Something I haven't seen explicitly mentioned as yet (in the PEP or the
 python-dev list discussion) are the memory management APIs and the FILE*
 APIs which can cause the MSVCRT versioning issues on Windows.

 Those would either need to be excluded from the stable ABI or else
 changed to use opaque pointers.
>> Good point. As a separate issue, I would actually like to deprecate,
>> then remove these APIs. I had originally hoped that this would happen
>> for 3.0 already, alas, nobody worked on it.
>>
>> In any case, I have removed them from the ABI now.
> 
> How do you expect Python extensions to allocate memory and objects
> in a platform independent way without those APIs ?

I have only removed functions from the ABI that have FILE* in their
signatures.

> And as an aside: Which API families are you referring to ? PyMem_Malloc,
> PyObject_Malloc, or PyObject_New ?

Neither. PyRun_AnyFileFlags and friends.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 384: Defining a Stable ABI

2009-05-26 Thread M.-A. Lemburg
Nick Coghlan wrote:
> M.-A. Lemburg wrote:
>> Now, with the PEP, I have a feeling that the Python C-API
>> will in effect be limited to what's in the PEP's idea of
>> a usable ABI and open up the non-inluded public C-APIs
>> to the same rate of change as the private APIs.
> 
> Not really - before this PEP it was already fairly easy to write an
> extension that was source-level compatible with multiple versions of
> Python (depending on exactly what you wanted to do, of course).

Right and I hope that things stay that way.

> However, it is essentially impossible to make an extension that is
> binary level compatible with multiple versions.

On Windows, yes. On Unix, this often worked, even though it wasn't
always safe to do.

In practice it's usually better to recompile extensions for every
single release.

> With the defined stable ABI in place, each extension module author will
> be able to make a choice:
> - choose binary compatibility by limiting themselves to the stable ABI
> and be able to provide a single binary that will still work with later
> versions of Py3k
> - stick with source compatibility and continue to provide new binaries
> for each version of Python

Great !

>> An optional cross-version ABI would certainly be a good thing.
>>
>> Limiting the Python C-API would be counterproductive.
> 
> I don't think anyone would disagree with that. A discussion on C-API sig
> would certainly be a good idea.
> 
>>> During the compilation of applications, the preprocessor macro
>>> Py_LIMITED_API must be defined. Doing so will hide all definitions
>>> that are not part of the ABI.
>> So extensions wanting to use the full Python C-API as documented
>> in the C-API docs will still be able to do this, right ?
> 
> Yep - they just wouldn't define the new macro.

Good !

>>> Type Objects
>>> 
>>>
>>> The structure of type objects is not available to applications;
>>> declaration of "static" type objects is not possible anymore
>>> (for applications using this ABI).
>> Hmm, that's going to create big problems for extensions that
>> want to expose a C-API for their types: Type checks are normally
>> done by pointer comparison using those static type objects.
> 
> They would just have to expose "MyExtensionPrefix_MyType_Check" and
> "MyExtensionPrefix_MyType_CheckExact" functions the same way that types
> in the C API do.

Hmm, that's a function call per type check and will slow things
down a lot, esp. when working with APIs that deal a lot with
these objects.

The typical way to implement these type checks is via a simple
pointer comparison (falling back to a function for sub-types).
That's cheap and fast.

>>> Functions and function-like Macros
>>> --
>>>
>>> Function-like macros (in particular, field access macros) remain
>>> available to applications, but get replaced by function calls
>>> (unless their definition only refers to features of the ABI, such
>>> as the various _Check macros)
>> Including Py_INCREF()/Py_DECREF() ?
> 
> I believe so - MvL deliberately left the fields that the ref counting
> relies on as part of the ABI.

Hmm, another slow-down. This one has even more impact if you're
writing extensions that have to deal with lots of objects.

>>> Excluded Functions
>>> --
>>>
>>> Functions declared in the following header files are not part
>>> of the ABI:
>>> - cellobject.h
>>> - classobject.h
>>> - code.h
>>> - frameobject.h
>>> - funcobject.h
>>> - genobject.h
>>> - pyarena.h
>>> - pydebug.h
>>> - symtable.h
>>> - token.h
>>> - traceback.h
>> I don't think that's feasable: you basically remove all introspection
>> functions that way.
>>
>> This will need a more fine-grained approach.
> 
> I don't think it is reasonable to expect the introspection interfaces to
> remain stable at a binary level across versions.
> 
> Having "I want deep introspection support from C" and "I want to use a
> single binary for multiple Python versions" be mutually exclusive
> choices sounds like a perfectly sensible position to me.
> 
> Also, keep in mind that even an extension module that restricts itself
> to Py_LIMITED_API would still be able to call in to the Python
> equivalents via PyObject_Call and friends (e.g. by importing and using
> the inspect and traceback modules).

Sure, but they'd also want to print tracebacks or raise fatal
errors if necessary.

>> What if you mix extensions that use the full C-API with ones
>> that restrict themselves to the limited version ?
>>
>> Would creating a Python object in a full-API extension and
>> free'ing it in a limited-API extension cause problems ?
> 
> Possibly, if you end up mixing C runtimes in the process. Specifically:
> 1. Python linked with MSVCRT X
> 2. Full extension module linked with MSVCRT Y
> 3. Limited extension module linked with MSVCRT Z
> 
> The PyMem/PyObject APIs in the limited extension module will use the
> heap in MSVCRT X, since they will be redirected through the Python
> stable ABI 

Re: [Python-Dev] PEP 384: Defining a Stable ABI

2009-05-26 Thread M.-A. Lemburg
Martin v. Löwis wrote:
>> Now, with the PEP, I have a feeling that the Python C-API
>> will in effect be limited to what's in the PEP's idea of
>> a usable ABI and open up the non-inluded public C-APIs
>> to the same rate of change as the private APIs.
> 
> That's certainly not the plan. Instead, the plan is to have
> a stable ABI. The policy on the API isn't affected, except
> for restricting changes to the API that would break the ABI.

Thanks for clarifying this.

>>> During the compilation of applications, the preprocessor macro
>>> Py_LIMITED_API must be defined. Doing so will hide all definitions
>>> that are not part of the ABI.
>> So extensions wanting to use the full Python C-API as documented
>> in the C-API docs will still be able to do this, right ?
> 
> Correct. They would link to the version-specific DLL on Windows.

Good.

>>> The structure of type objects is not available to applications;
>>> declaration of "static" type objects is not possible anymore
>>> (for applications using this ABI).
>> Hmm, that's going to create big problems for extensions that
>> want to expose a C-API for their types: Type checks are normally
>> done by pointer comparison using those static type objects.
> 
> I don't see the problem. During module initialization, you
> create the type object and store it in a global variable, and
> then both clients and the module compare against the stored
> pointer.

Ah, good point !

>>> Function-like macros (in particular, field access macros) remain
>>> available to applications, but get replaced by function calls
>>> (unless their definition only refers to features of the ABI, such
>>> as the various _Check macros)
>> Including Py_INCREF()/Py_DECREF() ?
> 
> Yes, although some people are requesting that these become functions.

I'd opt against that, simply because it creates a lot of overhead
due to the function call and issues with cache locality.

>>> Excluded Functions
>>> --
>>>
>>> Functions declared in the following header files are not part
>>> of the ABI:
>>> - cellobject.h
>>> - classobject.h
>>> - code.h
>>> - frameobject.h
>>> - funcobject.h
>>> - genobject.h
>>> - pyarena.h
>>> - pydebug.h
>>> - symtable.h
>>> - token.h
>>> - traceback.h
>> I don't think that's feasable: you basically remove all introspection
>> functions that way.
>>
>> This will need a more fine-grained approach.
> 
> What specifically is it that you want to do in a module that you
> couldn't do anymore?

See my reply to Nick: some of the functions are needed even
if you don't want to do introspection, such as Py_FatalError()
or PyTraceBack_Print().

BTW: Given the headline, I take it that the various type checking
macros in these header will still be available, right ?

>>> On Windows, applications shall link with python3.dll;
>> You mean: extensions that were compiled with Py_LIMITED_API, right ?
> 
> Correct, see "Terminology" in the PEP.

Good, thanks.

>>> an import
>>> library python3.lib will be available. This DLL will redirect all of
>>> its API functions through /export linker options to the full
>>> interpreter DLL, i.e. python3y.dll.
>> What if you mix extensions that use the full C-API with ones
>> that restrict themselves to the limited version ?
> 
> Some link against python3.dll, others against python32.dll (say).
> 
>> Would creating a Python object in a full-API extension and
>> free'ing it in a limited-API extension cause problems ?
> 
> No problem that I can see.

Can we be sure that the MSVCRT used by python35.dll stays compatible
to the one used by say python32.dll ? What if the CRT memory
management changes between MSVCRT versions ?

Another aspect to consider:

How will this work in the light of having multiple copies of
Python installed on a Windows machine ?

They implementation section suggests that python3.dll would always
redirect to the python3x.dll for which it was installed, ie. if
I have Python 3.5 installed, but then need to run some app with
Python 3.2, the installed python3.dll would then point back to the
python32.dll.

Now, if I start a Python 3.5 application which uses a limited
API extension, this would try to load python32.dll into the
Python 3.5 process. AFAIK, that's not possible due to the
naming conflicts.

>>> This PEP will be implemented in a branch, allowing users to check
>>> whether their modules conform to the ABI. To simplify this testing, an
>>> additional macro Py_LIMITED_API_WITH_TYPES will expose the existing
>>> type object layout, to let users postpone rewriting all types. When
>>> the this branch is merged into the 3.2 code base, this macro will
>>> be removed.
>> Now I'm confused again: this sounds a lot like you do want all extension
>> writers to only use the limited API.
> 
> I certainly want to support as many modules as reasonable with the PEP.
> Whether or not developers then chose to build version-independent
> binaries is certainly outside the scope of the PEP - it only specifies
> action items for Python, not 

Re: [Python-Dev] PEP 384: Defining a Stable ABI

2009-05-26 Thread Martin v. Löwis
 The structure of type objects is not available to applications;
 declaration of "static" type objects is not possible anymore
 (for applications using this ABI).
>>> Hmm, that's going to create big problems for extensions that
>>> want to expose a C-API for their types: Type checks are normally
>>> done by pointer comparison using those static type objects.
>> They would just have to expose "MyExtensionPrefix_MyType_Check" and
>> "MyExtensionPrefix_MyType_CheckExact" functions the same way that types
>> in the C API do.
> 
> Hmm, that's a function call per type check and will slow things
> down a lot, esp. when working with APIs that deal a lot with
> these objects.

See my other response. You can continue to provide _Check
macros; knowledge of the structure of types is not necessary to
perform such checks.

> The typical way to implement these type checks is via a simple
> pointer comparison (falling back to a function for sub-types).
> That's cheap and fast.

And will continue to be available to ABI-compliant extensions.

>>> Including Py_INCREF()/Py_DECREF() ?
>> I believe so - MvL deliberately left the fields that the ref counting
>> relies on as part of the ABI.
> 
> Hmm, another slow-down.

??? Why is "no change" a slow-down?

> This is not much of an issue if the C runtime DLL doesn't change
> between releases, but it becomes a problem when they do e.g.
> due to an upgrade to a new MSVC++ compiler version or in case
> the extension was downloaded pre-compiled from pypi or some
> other site.

What problem specifically may occur?

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 384: Defining a Stable ABI

2009-05-26 Thread Martin v. Löwis
 Functions declared in the following header files are not part
 of the ABI:
 - cellobject.h
 - classobject.h
 - code.h
 - frameobject.h
 - funcobject.h
 - genobject.h
 - pyarena.h
 - pydebug.h
 - symtable.h
 - token.h
 - traceback.h
>>> I don't think that's feasable: you basically remove all introspection
>>> functions that way.
>>>
>>> This will need a more fine-grained approach.
>> What specifically is it that you want to do in a module that you
>> couldn't do anymore?
> 
> See my reply to Nick: some of the functions are needed even
> if you don't want to do introspection, such as Py_FatalError()

Ok. I don't know what Py_FatalError is doing in pydebug.h, so I
now propose to move it to pyerrors.h.

> or PyTraceBack_Print().

Ok; I have removed traceback.h from the list. By the other rules
of the PEP, the only function that becomes available then is
PyTraceBack_Print.

> BTW: Given the headline, I take it that the various type checking
> macros in these header will still be available, right ?

Which headers? The one on the list above? No; my idea would
be to completely hide them as-is.

All other type-checking macros will remain available, and
will remain being macros.

>>> Would creating a Python object in a full-API extension and
>>> free'ing it in a limited-API extension cause problems ?
>> No problem that I can see.
> 
> Can we be sure that the MSVCRT used by python35.dll stays compatible
> to the one used by say python32.dll ? What if the CRT memory
> management changes between MSVCRT versions ?

It doesn't matter. For Python "things", the extension module will
use the pymem.h functions, which get routed through pythonxy.dll
to the CRT that Python was build with.

If the extension uses regular malloc(), it should also invoke
regular free() on the pointer. There is no API where Python
calls malloc directly and the extension calls free, or vice
versa.

> How will this work in the light of having multiple copies of
> Python installed on a Windows machine ?

Interesting question. One solution could be to use SxS, which
would allow multiple concurrent installations of python3.dll,
although we would need to make sure it always binds to the
"right" one in each context.

Another solution could be to keep the various copies of python3.dll
in their respective PYTHONHOMEs, and leave it to python.exe or the
app to load the right one; any subsequent extension modules should
then pick up the one that was already loaded.

> They implementation section suggests that python3.dll would always
> redirect to the python3x.dll for which it was installed, ie. if
> I have Python 3.5 installed, but then need to run some app with
> Python 3.2, the installed python3.dll would then point back to the
> python32.dll.

That depends on where they get installed. If they all go into system32,
only the most recent one would be available, which is probably not
desirable.

> Now, if I start a Python 3.5 application which uses a limited
> API extension, this would try to load python32.dll into the
> Python 3.5 process. AFAIK, that's not possible due to the
> naming conflicts.

I don't see this problem. As long as we manage to install multiple
versions of python3.dll on the system somehow, different processes
could certainly load different such DLLs, and the same extension
module would always use the right one.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Making the GIL faster & lighter on Windows

2009-05-26 Thread Phillip Sitbon
Hi everyone,

I'm new to the list but I've been embedding Python and working very
closely with the core sources for many years now. I discovered Python
a long time ago when I needed to embed a scripting language and found
the PHP sources... unreadable ;)

Anyway, I'd like to ask something that may have been asked already, so
I apologize if this has been covered.

Instead of removing the GIL, has anyone thought of making it more
lightweight? The current situation for Windows is that the
single-thread case is decently fast (via interlocked operations), but
it drops to using an event object in the case of contention. (see
thread_nt.h)

Now, I don't have any specific evidence aside from my experience in
Windows multithreaded programming, but event objects are often
considered the slowest synchronization mechanism available. So, what
are the alternatives? Mutexes or critical sections. Semaphores too, if
you want to get fancy, but I digress.

Because mutexes have the capability of inter-process locking, which we
don't need, critical sections fit the bill as a lightweight locking
mechanism. They work in a way similar to how the Python GIL is
handled: first, attempt an interlocked operation, and if another
thread owns the lock, wait on a kernel object. They are known to be
extremely fast.

There are some catches with using a critical section instead of the
current method:

1. It is recursive, while the current GIL setup is not. Would it break
Python to support (or deal with) recursive behavior at the GIL level?
Note that we can still disallow recursion and fail because we know if
the current thread is the lock owner, but the return from the lock
function is usually only checked when the wait parameter is zero
(meaning "don't block, just try to acquire"). The biggest problem I
see here is how mixing the PyGILState_* API with multiple interpreters
will behave: when PyGILState_Ensure() is called while the GIL is held
for a thread state under an interpreter other than the main
interpreter, it tries to re-lock the GIL. This would normally cause a
deadlock, but the best we could do with a critical section is have the
call fail and/or increase a recursion counter. If maintaining behavior
is absolutely necessary, I guess it would be pretty easy to force a
deadlock. Personally, I would prefer a Py_FatalError or something like
it.

2. Backwards incompatibility: TryEnterCriticalSection isn't available
pre-NT4, so Windows 95 support is broken. Microsoft doesn't support or
even mention it in the list of supporting OSes for their API functions
anymore, so... non-issue? Some of the data structure is available to
us, so I bet it would be easy to implement the function manually.

3. ?? - I'm sure there are other issues that deserve a look.

I've given this a shot already while doing some concurrency testing
with my ISAPI extension (PyISAPIe). First of all, nothing looks broken
yet. I'm using my modified python26.dll to run all of my Python code
and trying to find anywhere it could possibly break. For multiple
concurrent requests against a single multithreaded ISAPI handler
process, I see a statistically significant speed increase depending on
how much Python code is executed. With more Python code executed (e.g.
a Django page), the speedup was about 2x. I haven't tested with varied
values for _Py_CheckInterval aside from finding a sweet spot for my
specific purposes, but using 100 (the default) would likely make the
performance difference more noticeable. A spin mutex also does well,
but the results vary a lot more.

Just as a disclaimer, my tests were nowhere near scientific, but if
anyone needs convincing I can come up with some actual measurements. I
think at this point most of you are wondering more about what it would
break.

Hopefully I haven't wasted anyone's time - I just wanted to share what
I see as a possibly substantial improvement to Python's core. let me
know if you're interested in a patch to use for your own testing.

Cheers,

Phillip
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making the GIL faster & lighter on Windows

2009-05-26 Thread Antoine Pitrou

Hello,

> Hopefully I haven't wasted anyone's time - I just wanted to share what
> I see as a possibly substantial improvement to Python's core. let me
> know if you're interested in a patch to use for your own testing.

You should definitely open a bug entry in http://bugs.python.org. There, post
your patch, some explanations and preferably a quick way (e.g. a simple script)
 of reproducing the speedups (without having to install a third-party library or
extension, that is).

Thanks

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making the GIL faster & lighter on Windows

2009-05-26 Thread Glenn Linderman
On approximately 5/26/2009 12:48 PM, came the following characters from 
the keyboard of Phillip Sitbon:

Hi everyone,

I'm new to the list but I've been embedding Python and working very
closely with the core sources for many years now. I discovered Python
a long time ago when I needed to embed a scripting language and found
the PHP sources... unreadable ;)


...


I've given this a shot already while doing some concurrency testing
with my ISAPI extension (PyISAPIe). First of all, nothing looks broken
yet. I'm using my modified python26.dll to run all of my Python code
and trying to find anywhere it could possibly break. For multiple
concurrent requests against a single multithreaded ISAPI handler
process, I see a statistically significant speed increase depending on
how much Python code is executed. With more Python code executed (e.g.
a Django page), the speedup was about 2x. I haven't tested with varied
values for _Py_CheckInterval aside from finding a sweet spot for my
specific purposes, but using 100 (the default) would likely make the
performance difference more noticeable. A spin mutex also does well,
but the results vary a lot more.

Just as a disclaimer, my tests were nowhere near scientific, but if
anyone needs convincing I can come up with some actual measurements. I
think at this point most of you are wondering more about what it would
break.

Hopefully I haven't wasted anyone's time - I just wanted to share what
I see as a possibly substantial improvement to Python's core. let me
know if you're interested in a patch to use for your own testing.



I wonder if the patch could be structured as a conditional compilation? 
 You know how many different spots are touched, and how many lines per 
spot.


If it could be, then theoretically it could be released and people could 
 do lots of comparative stress testing with different workloads.



--
Glenn -- http://nevcal.com/
===
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making the GIL faster & lighter on Windows

2009-05-26 Thread Martin v. Löwis
> 3. ?? - I'm sure there are other issues that deserve a look.

What about fairness? I don't know off-hand whether the GIL is
fair, or whether critical sections are fair, but it needs to be
considered.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making the GIL faster & lighter on Windows

2009-05-26 Thread Phillip Sitbon
> You should definitely open a bug entry in http://bugs.python.org. There, post
> your patch, some explanations and preferably a quick way (e.g. a simple 
> script)
>  of reproducing the speedups (without having to install a third-party library 
> or
> extension, that is).

I'll get started on that. I'm assuming I should generate a patch from
the trunk (2.7)? The file doesn't look different, but I want to make
sure I get it from the right place.

> I wonder if the patch could be structured as a conditional compilation?  You
> know how many different spots are touched, and how many lines per spot.
>
> If it could be, then theoretically it could be released and people could  do
> lots of comparative stress testing with different workloads.

That would be easy to do, because I am just replacing the
*NonRecursiveMutex functions.

> What about fairness? I don't know off-hand whether the GIL is
> fair, or whether critical sections are fair, but it needs to be
> considered.

If you define fairness in this context as not starving other threads
while consuming resources, that is built into the interpreter via
sys.setcheckinterval() and also anywhere the GIL is released for I/O.
What might be interesting is to see if releasing a critical section
and immediately re-acquiring it every _Py_CheckInterval bytecode
operations behaves in a similar manner (see ceval.c, line 869). My
best guess right now is that it will behave as expected when not using
the spin-based critical section. AFAIK, the kernel processes waiters
in a FIFO manner without regard to priority. Because a guarantee of
mutual exclusion is absolutely necessary, it's up to applications to
provide fairness. Python does a decent job of this.

- Phillip
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making the GIL faster & lighter on Windows

2009-05-26 Thread Tim Lesher
On Tue, May 26, 2009 at 16:07, "Martin v. Löwis"  wrote:
>> 3. ?? - I'm sure there are other issues that deserve a look.
>
> What about fairness? I don't know off-hand whether the GIL is
> fair, or whether critical sections are fair, but it needs to be
> considered.

FWIW, Win32 CriticalSections are guaranteed to be fair, but they don't
guarantee a defined order of wakeup among threads of equal priority.

-- 
Tim Lesher 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making the GIL faster & lighter on Windows

2009-05-26 Thread Antoine Pitrou
Martin v. Löwis  v.loewis.de> writes:
> 
> What about fairness? I don't know off-hand whether the GIL is
> fair,

According to a past discussion on this list, the current implementation isn't:
http://mail.python.org/pipermail/python-dev/2008-March/077814.html
(at least on the poster's system)

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making the GIL faster & lighter on Windows

2009-05-26 Thread Phillip Sitbon
> FWIW, Win32 CriticalSections are guaranteed to be fair, but they don't
> guarantee a defined order of wakeup among threads of equal priority.

Indeed, I should have quoted the MSDN docs:

"The threads of a single process can use a critical section object for
mutual-exclusion synchronization. There is no guarantee about the
order in which threads will obtain ownership of the critical section,
however, the system will be fair to all threads."

http://msdn.microsoft.com/en-us/library/ms683472(VS.85).aspx

I read somewhere else that the FIFO order is present, but obviously we
shouldn't to expect that if it's not documented as such.

> According to a past discussion on this list, the current implementation isn't:
> http://mail.python.org/pipermail/python-dev/2008-March/077814.html
> (at least on the poster's system)
>

I believe he's only talking about Linux. Apples & oranges when it
comes to stuff like this, although it still justifies looking into
what happens every _Py_CheckInterval on Windows.

- Phillip
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making the GIL faster & lighter on Windows

2009-05-26 Thread Martin v. Löwis
> If you define fairness in this context as not starving other threads
> while consuming resources, that is built into the interpreter via
> sys.setcheckinterval() and also anywhere the GIL is released for I/O.
> What might be interesting is to see if releasing a critical section
> and immediately re-acquiring it every _Py_CheckInterval bytecode
> operations behaves in a similar manner (see ceval.c, line 869). My
> best guess right now is that it will behave as expected when not using
> the spin-based critical section. AFAIK, the kernel processes waiters
> in a FIFO manner without regard to priority. Because a guarantee of
> mutual exclusion is absolutely necessary, it's up to applications to
> provide fairness. Python does a decent job of this.

No: fairness in mutex synchronization means that every waiter for the
mutex will eventually acquire it; it won't happen that one thread
starves waiting for the mutex. This is something that the mutex needs to
provide, not the application.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making the GIL faster & lighter on Windows

2009-05-26 Thread Martin v. Löwis
>> According to a past discussion on this list, the current implementation 
>> isn't:
>> http://mail.python.org/pipermail/python-dev/2008-March/077814.html
>> (at least on the poster's system)
>>
> 
> I believe he's only talking about Linux. Apples & oranges when it
> comes to stuff like this

Please trust Antoine that it's relevant: if the current implementation
isn't fair on Linux, there is no need for the new implementation to be
fair on Windows.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making the GIL faster & lighter on Windows

2009-05-26 Thread Phillip Sitbon
> No: fairness in mutex synchronization means that every waiter for the
> mutex will eventually acquire it; it won't happen that one thread
> starves waiting for the mutex. This is something that the mutex needs to
> provide, not the application.

Right, I guess I was thinking of it in terms of needing to release the
mutex at some point in order for it to be later acquired.

> Please trust Antoine that it's relevant: if the current implementation
> isn't fair on Linux, there is no need for the new implementation to be
> fair on Windows.

Fair enough.

--

While setting up my patch, I'm noticing something that could be
potentially bad for this idea that I overlooked until just now. I'm
going to hold off on submitting a ticket unless others suggest it's a
better idea to keep this discussion going there.

The thread module's lock object uses the same code used to lock and
unlock the GIL. By replacing the current locking mechanism with a
critical section, it'd be breaking the expected functionality of the
lock object, specifically two cases:

1. Blocking recursion: Critical sections don't block on recursion, no
way to enforce that
2. Releasing: Currently any thread can release a lock, but only the
owner release a critical section

Of course blocking recursion is only meaningful with the current
behavior of #2, otherwise it's an unrecoverable deadlock.

There are a few solutions to this. The first would be to implement
only the GIL as a critical section. The problem then is the need to
change all of the core code that does not use
PyEval_Acquire/ReleaseLock (there is some, right?), which is the best
place to use something other than the thread module's locking
mechanism on the GIL. This is doable with some effort, but clearly not
an option if there is any possibility that extensions are using
something other than the PyThreadState_*, PyGILState_* and PyEval_*
APIs to manipulate the GIL (are there others?). After any of this, of
course, I wonder what kind of crazy things might be expected of the
GIL externally that requires its behavior to remain as it is.

The second solution would be to use semaphores. I can't say yet if it
would be worth it performance-wise so I will refrain from conjecture
for the moment.

I like the first solution above... I don't know why non-recursion
would be necessary for the GIL; clearly it would be a little more
involved, but if I can demonstrate the performance gain maybe it's
worth my time.

- Phillip
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com