Re: [Python-Dev] C API changes

2018-11-30 Thread Victor Stinner
I just would want to say that I'm very happy to read the discussions
about the C API finally happening on python-dev :-) The discussion is
very interesting!

> C is really the lingua franca

Sorry, but why not writing the API directly in french?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-30 Thread Steve Dower

On 30Nov2018 1133, Antoine Pitrou wrote:

On Fri, 30 Nov 2018 13:06:11 -0600
Neil Schemenauer  wrote:

On 2018-11-29, Armin Rigo wrote:

...Also, although I'm discussing it here, I think the whole approach
would be better if done as a third-party extension for now, without
requiring changes to CPython---just use the existing C API to
implement the CPython version.


Hello Armin,

Thank you for providing your input on this subject.  I too like the
idea of an API "shim layer" as a separate project.

What do you think of writing the shim layer in C++?  I'm not a C++
programmer but my understanding is that modern C++ compilers are
much better than years ago.  Using C++ would allow us to provide a
higher level API with smaller runtime costs.


The main problem with exposing a C++ *API* is that all people
implementing that API suddenly must understand and implement the C++
*ABI* (with itself varies from platform to platform :-)).  That's
trivially easy if your implementation is itself written in C++, but not
if it's written in something else such as RPython, Java, Rust, etc.

C is really the lingua franca when exposing an interface that can be
understood, implemented and/or interfaced with from many different
languages.

So I'd turn the proposal on its head: you can implement the internals
of your interpreter or object layer in C++ (and indeed I think it
would be crazy to start writing a new Python VM in raw C), but you
should still expose a C-compatible API for third-party providers and
consumers.


I totally agree with Antoine here. C++ is great for internals, but not 
the public interfaces.


The one additional point I'd add is that there are other ABIs that C++ 
can use (such as xlang, Corba and COM), which can provide stability in 
ways the plain-old C++ ABI does not. So we wouldn't necessarily have to 
design a new C-based ABI for this, we could adopt an existing one that 
is already proven and already has supporting tools.


Cheers,
Steve


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-30 Thread Antoine Pitrou
On Fri, 30 Nov 2018 13:06:11 -0600
Neil Schemenauer  wrote:
> On 2018-11-29, Armin Rigo wrote:
> > ...Also, although I'm discussing it here, I think the whole approach
> > would be better if done as a third-party extension for now, without
> > requiring changes to CPython---just use the existing C API to
> > implement the CPython version.  
> 
> Hello Armin,
> 
> Thank you for providing your input on this subject.  I too like the
> idea of an API "shim layer" as a separate project.
> 
> What do you think of writing the shim layer in C++?  I'm not a C++
> programmer but my understanding is that modern C++ compilers are
> much better than years ago.  Using C++ would allow us to provide a
> higher level API with smaller runtime costs.

The main problem with exposing a C++ *API* is that all people
implementing that API suddenly must understand and implement the C++
*ABI* (with itself varies from platform to platform :-)).  That's
trivially easy if your implementation is itself written in C++, but not
if it's written in something else such as RPython, Java, Rust, etc.

C is really the lingua franca when exposing an interface that can be
understood, implemented and/or interfaced with from many different
languages.

So I'd turn the proposal on its head: you can implement the internals
of your interpreter or object layer in C++ (and indeed I think it
would be crazy to start writing a new Python VM in raw C), but you
should still expose a C-compatible API for third-party providers and
consumers.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-30 Thread Armin Rigo
Hi Steve,

On 30/11/2018, Steve Dower  wrote:
> On 29Nov2018 2206, Armin Rigo wrote:
>> On Thu, 29 Nov 2018 at 18:19, Steve Dower  wrote:
>>> quo. We continue to not be able to change CPython internals at all,
>>> since that will break people using option B.
>>
>> No?  That will only break users if they only have an option-B
>> ``foo.cpython-318m-x86_64-linux-gnu.so``, no option-A .so and no
>> source code, and want to run it elsewhere than CPython 3.18.  That's
>> the same as today.  If you want option-B .so for N versions of
>> CPython, recompile the source code N times.
>>
>> Just to be clear, if done correctly there should be no need for
>> #ifdefs in the source code of the extension module.
>
> The problem is that if option B remains as compatible as it is today, we
> can't make option A faster enough to be attractive. The marketing pitch
> for this looks like: "rewrite all your existing code to be slower but
> works with PyPy, or don't rewrite your existing code and it'll be
> fastest with CPython and won't break in the future". This is status quo
> (where option A today is something like CFFI or Cython), and we can
> already see how many people have made the switch (FWIW, I totally prefer
> Cython over pure C for my own projects :) ).
>
> My proposed marketing pitch is: "rewrite your existing code to be
> forward-compatible today and faster in the future without more work, or
> be prepared to rewrite/update your source code for each CPython release
> to remain compatible with the low level API".
> (...)

Discussing marketing pitches on python-dev is not one of my favorite
past-times, so I'll excuse myself out of this conversation.  Instead,
I might try to implement the basics, check out the performance on
CPython and on PyPy, and seek out interest---I'm thinking about
Cython, for example, which might relatively easily be adapted to
generate that kind of code.  This might be a solution for the poor
performance of Cython on PyPy...  If everything works out, maybe I'll
come back here at some point, with the argument "the CPython C API is
blocking CPython from evolution more and more?  Here's one possible
path forward."


A bientôt,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-30 Thread Neil Schemenauer
On 2018-11-29, Armin Rigo wrote:
> ...Also, although I'm discussing it here, I think the whole approach
> would be better if done as a third-party extension for now, without
> requiring changes to CPython---just use the existing C API to
> implement the CPython version.

Hello Armin,

Thank you for providing your input on this subject.  I too like the
idea of an API "shim layer" as a separate project.

What do you think of writing the shim layer in C++?  I'm not a C++
programmer but my understanding is that modern C++ compilers are
much better than years ago.  Using C++ would allow us to provide a
higher level API with smaller runtime costs.  However, it would
require that any project using the shim layer would have to be
compiled with a C++ compiler (CPython and PyPy could still expose a
C compatible API).

Perhaps it is a bad idea.  If someone does create such a shim layer,
it will already be challenging to convince extension authors to move
to it.  If it requires them to switch to using a C++ compiler rather
than a C compiler, maybe that's too much effort.  OTOH, with C++ I
think you could do things like use smart pointers to automatically
handle refcounts on the handles.  Or maybe we should just skip C++
and implement the layer in Rust.  Then the Rust borrow checker can
handle the refcounts. ;-)

Regards,

  Neil
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] C API changes

2018-11-30 Thread Stefan Krah


Steve Dower wrote:
> My proposed marketing pitch is: "rewrite your existing code to be 
> forward-compatible today and faster in the future without more work, or 
> be prepared to rewrite/update your source code for each CPython release 
> to remain compatible with the low level API". The promise of "faster in 
> the future" needs to be justified (and I think there's plenty of 
> precedent in PyPy, Larry's Gilectomy and the various JavaScript VMs to 
> assume that we can do it).

It's hard to discuss this in the abstract without knowing how big the
breakage between each version is going to be.

But for the scientific ecosystem this sounds a bit like a potential
Python-4.0 breakage, which was universally rejected (so far).

In the extreme case I can imagine people staying on 3.7.

But it really depends on the nature of the changes.



Stefan Krah




___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-30 Thread Antoine Pitrou
On Fri, 30 Nov 2018 09:22:30 -0800
Steve Dower  wrote:
> 
> My proposed marketing pitch is: "rewrite your existing code to be 
> forward-compatible today and faster in the future without more work, or 
> be prepared to rewrite/update your source code for each CPython release 
> to remain compatible with the low level API". The promise of "faster in 
> the future" needs to be justified (and I think there's plenty of 
> precedent in PyPy, Larry's Gilectomy and the various JavaScript VMs to 
> assume that we can do it).

I think that should be qualified.  Technically it's certainly possible
to have a faster CPython with different internals.  Socially and
organisationally I'm not sure we're equipped to achieve it.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-30 Thread Steve Dower

On 29Nov2018 2206, Armin Rigo wrote:

On Thu, 29 Nov 2018 at 18:19, Steve Dower  wrote:

quo. We continue to not be able to change CPython internals at all,
since that will break people using option B.


No?  That will only break users if they only have an option-B
``foo.cpython-318m-x86_64-linux-gnu.so``, no option-A .so and no
source code, and want to run it elsewhere than CPython 3.18.  That's
the same as today.  If you want option-B .so for N versions of
CPython, recompile the source code N times.

Just to be clear, if done correctly there should be no need for
#ifdefs in the source code of the extension module.


The problem is that if option B remains as compatible as it is today, we 
can't make option A faster enough to be attractive. The marketing pitch 
for this looks like: "rewrite all your existing code to be slower but 
works with PyPy, or don't rewrite your existing code and it'll be 
fastest with CPython and won't break in the future". This is status quo 
(where option A today is something like CFFI or Cython), and we can 
already see how many people have made the switch (FWIW, I totally prefer 
Cython over pure C for my own projects :) ).


My proposed marketing pitch is: "rewrite your existing code to be 
forward-compatible today and faster in the future without more work, or 
be prepared to rewrite/update your source code for each CPython release 
to remain compatible with the low level API". The promise of "faster in 
the future" needs to be justified (and I think there's plenty of 
precedent in PyPy, Larry's Gilectomy and the various JavaScript VMs to 
assume that we can do it).


We've already done enough investigation to know that making the runtime 
faster requires changing the low level APIs, and otherwise we're stuck 
in a local optima. Offering a stable, loosely coupled option A and then 
*planning* to break the low level APIs each version in the name of 
performance is the only realistic way to change what we're currently doing.


Cheers,
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-29 Thread Armin Rigo
Hi,

On Thu, 29 Nov 2018 at 18:19, Steve Dower  wrote:
> quo. We continue to not be able to change CPython internals at all,
> since that will break people using option B.

No?  That will only break users if they only have an option-B
``foo.cpython-318m-x86_64-linux-gnu.so``, no option-A .so and no
source code, and want to run it elsewhere than CPython 3.18.  That's
the same as today.  If you want option-B .so for N versions of
CPython, recompile the source code N times.

Just to be clear, if done correctly there should be no need for
#ifdefs in the source code of the extension module.


A bientôt,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-29 Thread Steve Dower

On 28Nov2018 2208, Armin Rigo wrote:

Hi Steve,

On Tue, 27 Nov 2018 at 19:14, Steve Dower  wrote:

On 27Nov2018 0609, Victor Stinner wrote:

Note: Again, in my plan, the new C API would be an opt-in API. The old
C API would remain unchanged and fully supported. So there is no
impact on performance if you consider to use the old C API.


This is one of the things that makes me think your plan is not feasible.


I can easily imagine the new API having two different implementations
even for CPython:

A) you can use the generic implementation, which produces a
cross-python-compatible .so.  All function calls go through the API at
runtime.  The same .so works on any version of CPython or PyPy.

B) you can use a different set of headers or a #define or something,
and you get a higher-performance version of your unmodified
code---with the issue that the .so only runs on the exact version of
CPython.  This is done by defining some of the functions as macros.  I
would expect this version to be of similar speed than the current C
API in most cases.

This might give a way forward: people would initially port their
extensions hoping to use the option B; once that is done, they can
easily measure---not guess--- the extra performance costs of the
option A, and decide based on actual data if the difference is really
worth the additional troubles of distributing many versions.  Even if
it is, they can distribute an A version for PyPy and for unsupported
CPython versions, and add a few B versions on top of that.


This makes sense, but unless it results in PyPy drastically gaining 
popularity as a production runtime, it basically leaves us in the status 
quo. We continue to not be able to change CPython internals at all, 
since that will break people using option B.


Though potentially if we picked an official option for A, we could 
deprecate the stability of option B (over a few releases) and require 
people using it to thoroughly test, update and #ifdef their code for 
each version. That would allow us to make changes to the runtime while 
preserving option A as the reliable version.


You might want to have a look at https://github.com/Microsoft/xlang/ 
which is not yet ready for showtime (in particular, there's no "make it 
look Pythonic" support yet), but is going to extend our existing 
cross-language ABI to Python (alongside C++/.NET/JS) and non-Windows 
platforms. It's been in use for years in Windows and has been just fine. 
(Sample generated output at 
https://github.com/devhawk/pywinrt-output/tree/master/generated/pyrt/src 
but the design docs at the first link are probably most interesting.)


Cheers,
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-28 Thread Chris Angelico
On Thu, Nov 29, 2018 at 5:10 PM Armin Rigo  wrote:
> PS: on CPython could use ``typedef struct { PyObject *_obj; }
> PyHandle;``.  This works like a pointer, but you can't use ``==`` to
> compare them.

And then you could have a macro or inline function to compare them,
simply by looking at that private member, and it should compile down
to the exact same machine code as comparing the original pointers
directly. It'd be a not-unreasonable migration path, should you want
to work that way - zero run-time cost.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-28 Thread Armin Rigo
Hi Steve,

On Tue, 27 Nov 2018 at 19:14, Steve Dower  wrote:
> On 27Nov2018 0609, Victor Stinner wrote:
> > Note: Again, in my plan, the new C API would be an opt-in API. The old
> > C API would remain unchanged and fully supported. So there is no
> > impact on performance if you consider to use the old C API.
>
> This is one of the things that makes me think your plan is not feasible.

I can easily imagine the new API having two different implementations
even for CPython:

A) you can use the generic implementation, which produces a
cross-python-compatible .so.  All function calls go through the API at
runtime.  The same .so works on any version of CPython or PyPy.

B) you can use a different set of headers or a #define or something,
and you get a higher-performance version of your unmodified
code---with the issue that the .so only runs on the exact version of
CPython.  This is done by defining some of the functions as macros.  I
would expect this version to be of similar speed than the current C
API in most cases.

This might give a way forward: people would initially port their
extensions hoping to use the option B; once that is done, they can
easily measure---not guess--- the extra performance costs of the
option A, and decide based on actual data if the difference is really
worth the additional troubles of distributing many versions.  Even if
it is, they can distribute an A version for PyPy and for unsupported
CPython versions, and add a few B versions on top of that.


...Also, although I'm discussing it here, I think the whole approach
would be better if done as a third-party extension for now, without
requiring changes to CPython---just use the existing C API to
implement the CPython version.  The B option discussed above can even
be mostly *just* a set of macros, with a bit of runtime that we might
as well include in the produced .so in order to make it a standalone,
regular CPython C extension module.


A bientôt,

Armin.

PS: on CPython could use ``typedef struct { PyObject *_obj; }
PyHandle;``.  This works like a pointer, but you can't use ``==`` to
compare them.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-27 Thread Gregory P. Smith
On Mon, Nov 26, 2018 at 4:10 PM Larry Hastings  wrote:

> On 11/23/18 5:15 AM, Armin Rigo wrote:
>
> Also FWIW, my own 2 cents on the topic of changing the C API: let's
> entirely drop ``PyObject *`` and instead use more opaque
> handles---like a ``PyHandle`` that is defined as a pointer-sized C
> type but is not actually directly a pointer.  The main difference this
> would make is that the user of the API cannot dereference anything
> from the opaque handle, nor directly compare handles with each other
> to learn about object identity.  They would work exactly like Windows
> handles or POSIX file descriptors.
>
>
> Why would this be better than simply returning the pointer?  Sure, it
> prevents ever dereferencing the pointer and messing with the object, it is
> true.  So naughty people would be prevented from messing with the object
> directly instead of using the API as they should.  But my understanding is
> that the implementation would be slightly slower--there'd be all that
> looking up objects based on handles, and managing the handle namespace
> too.  I'm not convinced the nice-to-have of "you can't dereference the
> pointer anymore" is worth this runtime overhead.
>
> Or maybe you have something pretty cheap in mind, e.g. "handle = pointer ^
> 49"?  Or even "handle = pointer ^ (random odd number picked at startup)" to
> punish the extra-naughty?
>
Heck, it'd be find if someones implementation (such as a simple shim for
CPython's existing API) wants to internally keep a PyObject structure and
have PyHandle's implementation just be a typecast from PyObject* to
PyHandle.  The real point is that a handle is opaque and cannot be depended
on by any API _user_ as being a pointer.  What it means behind the scenes
of a given VM is left entirely up to the VM.

When an API returns a handle, that is an implicit internal INCREF if a VM
is reference counting.  When code calls an API that consumes a handle by
taking ownership of it for itself (Py_DECREF could be considered one of
these if you have a Py_DECREF equivalent API) that means "I can no longer
using this handle".

Comparisons get documented as being invalid, pointing to the API to call
for an identity check, but it is up to each implementation to decide if it
wants to force the handles to be unique. Anyone depending on that behavior
is being bad and should not be supported.

-gps

PS ... use C++ and you could actually make handle identity comparisons do
the right thing...
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-27 Thread Steve Dower

On 27Nov2018 0609, Victor Stinner wrote:

Note: Again, in my plan, the new C API would be an opt-in API. The old
C API would remain unchanged and fully supported. So there is no
impact on performance if you consider to use the old C API.


This is one of the things that makes me think your plan is not feasible.

I *hope* that remaining on the old C API eventually has a performance 
impact, since the whole point is to enable new optimizations that 
currently require tricky emulation to remain compatible with the old 
API. If we never have to add any emulation for the old API, we haven't 
added anything useful for the new one.


Over time, the old C API's performance (not functionality) should 
degrade as the new C API's performance increases. If the increase isn't 
significantly better than the degradation, the whole project can be 
declared a failure, as we would have been better off leaving the API 
alone and not changing anything.


But this is great discussion. Looking forward to seeing some of it turn 
into reality :)


Cheers,
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-27 Thread Victor Stinner
Le mar. 27 nov. 2018 à 01:13, Larry Hastings  a écrit :
> (...) I'm not convinced the nice-to-have of "you can't dereference the 
> pointer anymore" is worth this runtime overhead.

About the general idea of a new C API.

If you only look at CPython in release mode, there is no benefit. But
you should consider the overall picture:

* ability to distribute a single binary for CPython in release mode,
CPython in debug mode, PyPy, and maybe some new more funky Python
runtimes
* better performance on PyPy

The question is if we can implement new optimizations in CPython (like
tagged pointer) which would move the overall performance impact to at
least "not significant" (not slower, not faster), or maybe even to
"faster".

Note: Again, in my plan, the new C API would be an opt-in API. The old
C API would remain unchanged and fully supported. So there is no
impact on performance if you consider to use the old C API.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-26 Thread Nathaniel Smith
On Mon, Nov 26, 2018 at 6:12 PM Eric V. Smith  wrote:
> I thought the important part of the proposal was to have multiple
> PyHandles that point to the same PyObject (you couldn't "directly
> compare handles with each other to learn about object identity"). But
> I'll admit I'm not sure why this would be a win. Then of course they
> couldn't be regular pointers.

Whenever PyPy passes an object from PyPy -> C, then it has to invent a
"PyObject*" to represent the PyPy object. 0.1% of the time, the C code
will use C pointer comparison to implement an "is" check on this
PyObject*. But PyPy doesn't know which 0.1% of the time this will
happen, so 100% of the time an object goes from PyPy -> C, PyPy has to
check and update some global intern table to figure out whether this
particular object has ever made the transition before and use the same
PyObject*. 99.9% of the time, this is pure overhead, and it slows down
one of *the* most common operations C extension code does. If C
extensions checked object identity using some explicit operation like
PyObject_Is() instead of comparing pointers, then PyPy could defer the
expensive stuff until someone actually called PyObject_Is().

Note: numbers are made up and I have no idea how much overhead this
actually adds. But I'm pretty sure this is the basic idea that Armin's
talking about.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-26 Thread MRAB

On 2018-11-27 00:08, Larry Hastings wrote:

On 11/23/18 5:15 AM, Armin Rigo wrote:

Also FWIW, my own 2 cents on the topic of changing the C API: let's
entirely drop ``PyObject *`` and instead use more opaque
handles---like a ``PyHandle`` that is defined as a pointer-sized C
type but is not actually directly a pointer.  The main difference this
would make is that the user of the API cannot dereference anything
from the opaque handle, nor directly compare handles with each other
to learn about object identity.  They would work exactly like Windows
handles or POSIX file descriptors.


Why would this be better than simply returning the pointer? Sure, it 
prevents ever dereferencing the pointer and messing with the object, it 
is true.  So naughty people would be prevented from messing with the 
object directly instead of using the API as they should.  But my 
understanding is that the implementation would be slightly 
slower--there'd be all that looking up objects based on handles, and 
managing the handle namespace too.  I'm not convinced the nice-to-have 
of "you can't dereference the pointer anymore" is worth this runtime 
overhead.


Or maybe you have something pretty cheap in mind, e.g. "handle = pointer 
^ 49"?  Or even "handle = pointer ^ (random odd number picked at 
startup)" to punish the extra-naughty?


An advantage would be that objects could be moved to reduce memory 
fragmentation.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-26 Thread Eric V. Smith

On 11/26/2018 7:08 PM, Larry Hastings wrote:

On 11/23/18 5:15 AM, Armin Rigo wrote:

Also FWIW, my own 2 cents on the topic of changing the C API: let's
entirely drop ``PyObject *`` and instead use more opaque
handles---like a ``PyHandle`` that is defined as a pointer-sized C
type but is not actually directly a pointer.  The main difference this
would make is that the user of the API cannot dereference anything
from the opaque handle, nor directly compare handles with each other
to learn about object identity.  They would work exactly like Windows
handles or POSIX file descriptors.


Why would this be better than simply returning the pointer? Sure, it 
prevents ever dereferencing the pointer and messing with the object, it 
is true.  So naughty people would be prevented from messing with the 
object directly instead of using the API as they should.  But my 
understanding is that the implementation would be slightly 
slower--there'd be all that looking up objects based on handles, and 
managing the handle namespace too.  I'm not convinced the nice-to-have 
of "you can't dereference the pointer anymore" is worth this runtime 
overhead.


Or maybe you have something pretty cheap in mind, e.g. "handle = pointer 
^ 49"?  Or even "handle = pointer ^ (random odd number picked at 
startup)" to punish the extra-naughty?


I thought the important part of the proposal was to have multiple 
PyHandles that point to the same PyObject (you couldn't "directly 
compare handles with each other to learn about object identity"). But 
I'll admit I'm not sure why this would be a win. Then of course they 
couldn't be regular pointers.


Eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-26 Thread Larry Hastings

On 11/23/18 5:15 AM, Armin Rigo wrote:

Also FWIW, my own 2 cents on the topic of changing the C API: let's
entirely drop ``PyObject *`` and instead use more opaque
handles---like a ``PyHandle`` that is defined as a pointer-sized C
type but is not actually directly a pointer.  The main difference this
would make is that the user of the API cannot dereference anything
from the opaque handle, nor directly compare handles with each other
to learn about object identity.  They would work exactly like Windows
handles or POSIX file descriptors.


Why would this be better than simply returning the pointer? Sure, it 
prevents ever dereferencing the pointer and messing with the object, it 
is true.  So naughty people would be prevented from messing with the 
object directly instead of using the API as they should.  But my 
understanding is that the implementation would be slightly 
slower--there'd be all that looking up objects based on handles, and 
managing the handle namespace too.  I'm not convinced the nice-to-have 
of "you can't dereference the pointer anymore" is worth this runtime 
overhead.


Or maybe you have something pretty cheap in mind, e.g. "handle = pointer 
^ 49"?  Or even "handle = pointer ^ (random odd number picked at 
startup)" to punish the extra-naughty?




//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-26 Thread Stefan Behnel
Armin Rigo schrieb am 26.11.18 um 06:37:
> On Sun, 25 Nov 2018 at 10:15, Stefan Behnel wrote:
>> Overall, this seems like something that PyPy could try out as an
>> experiment, by just taking a simple extension module and replacing all
>> increfs with newref assignments. And obviously implementing the whole thing
>> for the C-API
> 
> Just to be clear, I suggested making a new API, not just tweaking
> Py_INCREF() and hoping that all the rest works as it is.  I'm
> skeptical about that.

Oh, I'm not skeptical at all. I'm actually sure that it's not that easy. I
would guess that such an automatic transformation should work in something
like 70% of the cases. Another 25% should be trivial to fix manually, and
the remaining 5% … well. They can probably still be changed with some
thinking and refactoring. That also involves cases where pointer equality
is used to detect object identity. Having a macro for that might be a good
idea.

Overall, relatively easy. And therefore not unlikely to happen. The lower
the bar, the more likely we will see adoption.

Also note that explicit Py_INCREF() calls are actually not that common. I
just checked and found only 465 calls in 124K lines of Cython generated C
code for Cython itself, and 725 calls in 348K C lines of lxml. Not exactly
a snap, but definitely not huge. All other objects originate from the C-API
in one way or another, which you control.


> To start with, a ``Py_NEWREF()`` like you describe *will* lead people
> just renaming all ``Py_INCREF()`` to ``Py_NEWREF()`` ignoring the
> return value, because that's the easiest change and it would work fine
> on CPython.

First of all, as long as Py_INCREF() is not going away, they probably won't
change anything. Therefore, before we discuss how laziness will hinder the
adoption, I would rather like to see an actual motivation for them to do
it. And since this change seems to have zero advantages in CPython, but
adds a tiny bit of complexity, I think it's now up to PyPy to show that
this added complexity has an advantage that is large enough to motivates
it. If you could come up with a prototype that demonstrates the advantage
(or at least uncovers the problems we'd face), we could actually discuss
about real solutions rather than uncertain ideas.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-26 Thread E. Madison Bray
On Fri, Nov 23, 2018 at 2:22 PM Armin Rigo  wrote:
>
> Hi Hugo, hi all,
>
> On Sun, 18 Nov 2018 at 22:53, Hugh Fisher  wrote:
> > I suggest that for the language reference, use the license plate
> > or registration analogy to introduce "handle" and after that use
> > handle throughout. It's short, distinctive, and either will match
> > up with what the programmer already knows or won't clash if
> > or when they encounter handles elsewhere.
>
> FWIW, a "handle" is typically something that users of an API store and
> pass around, and which can be used to do all operations on some
> object.  It is whatever a specific implementation needs to describe
> references to an object.  In the CPython C API, this is ``PyObject*``.
> I think that using "handle" for something more abstract is just going
> to create confusion.
>
> Also FWIW, my own 2 cents on the topic of changing the C API: let's
> entirely drop ``PyObject *`` and instead use more opaque
> handles---like a ``PyHandle`` that is defined as a pointer-sized C
> type but is not actually directly a pointer.  The main difference this
> would make is that the user of the API cannot dereference anything
> from the opaque handle, nor directly compare handles with each other
> to learn about object identity.  They would work exactly like Windows
> handles or POSIX file descriptors.  These handles would be returned by
> C API calls, and would need to be closed when no longer used.  Several
> different handles may refer to the same object, which stays alive for
> at least as long as there are open handles to it.  Doing it this way
> would untangle the notion of objects from their actual implementation.
> In CPython objects would internally use reference counting, a handle
> is really just a PyObject pointer in disguise, and closing a handle
> decreases the reference counter.  In PyPy we'd have a global table of
> "open objects", and a handle would be an index in that table; closing
> a handle means writing NULL into that table entry.  No emulated
> reference counting needed: we simply use the existing GC to keep alive
> objects that are referenced from one or more table entries.  The cost
> is limited to a single indirection.

+1

As another point of reference, if you're interested, I've been working
lately on the special purpose computer algebra system GAP.  It also
uses an approach like this: Objects are referenced throughout via an
opaque "Obj" type (which is really just a typedef of "Bag", the
internal storage reference handle of its "GASMAN" garbage collector
[1]).  A nice benefit of this, along with the others discussed above,
is that it has being relatively easy to replace the garbage collector
in GAP--there are options for it to use Boehm-GC, as well as Julia's
GC.

GAP has its own problems, but it's relatively simple and has been
inspiring to look at; I was coincidentally wondering just recently if
there's anything Python could take from it (conversely, I'm trying to
bring some things I've learned from Python to improve GAP...).

[1] https://github.com/gap-system/gap/blob/master/src/gasman.c
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-25 Thread Armin Rigo
Hi,

On Sun, 25 Nov 2018 at 10:15, Stefan Behnel  wrote:
> Overall, this seems like something that PyPy could try out as an
> experiment, by just taking a simple extension module and replacing all
> increfs with newref assignments. And obviously implementing the whole thing
> for the C-API

Just to be clear, I suggested making a new API, not just tweaking
Py_INCREF() and hoping that all the rest works as it is.  I'm
skeptical about that.

To start with, a ``Py_NEWREF()`` like you describe *will* lead people
just renaming all ``Py_INCREF()`` to ``Py_NEWREF()`` ignoring the
return value, because that's the easiest change and it would work fine
on CPython.


A bientôt,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-25 Thread Stefan Behnel
Hi Armin,

Armin Rigo schrieb am 25.11.18 um 06:15:
> On Sat, 24 Nov 2018 at 22:17, Stefan Behnel wrote:
>> Couldn't this also be achieved via reference counting? Count only in C
>> space, and delete the "open object" when the refcount goes to 0?
> 
> The point is to remove the need to return the same handle to C code if
> the object is the same one.  This saves one of the largest costs of
> the C API emulation, which is looking up the object in a big
> dictionary to know if there is already a ``PyObject *`` that
> corresponds to it or not---for *all* objects that go from Python to C.

Ok, got it. And since the handle is a simple integer, there's also no
additional cost for memory allocation on the way out.


> Once we do that, then there is no need for a refcount any more.  Yes,
> you could add your custom refcount code in C, but in practice it is
> rarely done.  For example, with POSIX file descriptors, when you would
> need to "incref" a file descriptor, you instead use dup().  This gives
> you a different file descriptor which can be closed independently of
> the original one, but they both refer to the same file.

Ok, then an INCREF() would be replaced by such a dup() call that creates
and returns a new handle. In CPython, it would just INCREF and return the
PyObject*, which is as fast as the current Py_INCREF().

For PyPy, however, that means that increfs become more costly. One of the
outcomes of a recent experiment with tagged pointers for integers was that
they make increfs and decrefs more expensive, and (IIUC) that reduced the
overall performance quite visibly. In the case of pointers, it's literally
just adding a tiny condition that makes this so much slower. In the case of
handles, it would add a lookup and a reference copy in the handles array.
That's way more costly already than just the simple condition.

Now, it's unclear if this performance degredation is specific to CPython
(where PyObject* is native), or if it would also apply to PyPy. But I guess
the only way to find this out would be to try it.

IIUC, the only thing that is needed is to replace

Py_INCREF(obj);

with

obj = Py_NEWREF(obj);

which CPython would implement as

#define Py_NEWREF(obj)  (Py_INCREF(obj), obj)

Py_DECREF() would then just invalidate and clean up the handle under the hood.

There are probably some places in user code where this would end up leaking
a reference by accident because of unclean reference handling (it could
overwrite the old handle in the case of a temporary INCREF/DECREF cycle),
but it might still be enough for trying it out. We could definitely switch
to this pattern in Cython (in fact, we already use such a NEWREF macro in a
couple of places, since it's a common pattern).

Overall, this seems like something that PyPy could try out as an
experiment, by just taking a simple extension module and replacing all
increfs with newref assignments. And obviously implementing the whole thing
for the C-API, but IIUC, you might be able to tweak that into your cpyext
wrapping layer somehow, without manually rewriting all C-API functions?

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-24 Thread Armin Rigo
Hi Stefan,

On Sat, 24 Nov 2018 at 22:17, Stefan Behnel  wrote:
> Couldn't this also be achieved via reference counting? Count only in C
> space, and delete the "open object" when the refcount goes to 0?

The point is to remove the need to return the same handle to C code if
the object is the same one.  This saves one of the largest costs of
the C API emulation, which is looking up the object in a big
dictionary to know if there is already a ``PyObject *`` that
corresponds to it or not---for *all* objects that go from Python to C.

Once we do that, then there is no need for a refcount any more.  Yes,
you could add your custom refcount code in C, but in practice it is
rarely done.  For example, with POSIX file descriptors, when you would
need to "incref" a file descriptor, you instead use dup().  This gives
you a different file descriptor which can be closed independently of
the original one, but they both refer to the same file.


A bientôt,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-24 Thread Stefan Behnel
Armin Rigo schrieb am 23.11.18 um 14:15:
> In PyPy we'd have a global table of
> "open objects", and a handle would be an index in that table; closing
> a handle means writing NULL into that table entry.  No emulated
> reference counting needed: we simply use the existing GC to keep alive
> objects that are referenced from one or more table entries.  The cost
> is limited to a single indirection.

Couldn't this also be achieved via reference counting? Count only in C
space, and delete the "open object" when the refcount goes to 0?

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API changes

2018-11-24 Thread Nick Coghlan
On Fri, 23 Nov 2018 at 23:24, Armin Rigo  wrote
(regarding opaque "handles" in the C API):
> The C API would change a lot, so it's not reasonable to do that in the
> CPython repo.  But it could be a third-party project, attempting to
> define an API like this and implement it well on top of both CPython
> and PyPy.  IMHO this might be a better idea than just changing the API
> of functions defined long ago to make them more regular (e.g. stop
> returning borrowed references); by now this would mostly mean creating
> more work for the PyPy team to track and adapt to the changes, with no
> real benefits.

And the nice thing about doing it as a shim is that it can be applied
to *existing* versions of CPython, rather than having to wait for new
ones.

Node.js started switching over to doing things this way last year, and
its a good way to go about it:
https://medium.com/the-node-js-collection/n-api-next-generation-node-js-apis-for-native-modules-169af5235b06

While this would still be a difficult project to pursue, and would
suffer from many of the same barriers to adoption as CPython's native
stable ABI, it does offer a concrete benefit to 3rd party module
authors: being able to create single wheel files that can be shared
across multiple Python versions, rather than needing to be built
separately for each one.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] C API changes

2018-11-23 Thread Stefan Krah


Armin Rigo wrote:
> The C API would change a lot, so it's not reasonable to do that in the
> CPython repo.  But it could be a third-party project, attempting to
> define an API like this and implement it well on top of both CPython
> and PyPy.  IMHO this might be a better idea than just changing the API
> of functions defined long ago to make them more regular (e.g. stop
> returning borrowed references); by now this would mostly mean creating
> more work for the PyPy team to track and adapt to the changes, with no
> real benefits.

I like this idea.  For example, when writing two versions of a C module,
one that uses CPython internals indiscriminately and another that uses
a "clean" API, such a third-party project would help.

I'd also be more motivated to write two versions if I know that the
project is supported by PyPy devs.


Do you think that such an API might be faster than CFFI on PyPy?


Stefan Krah



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] C API changes

2018-11-23 Thread Armin Rigo
Hi Hugo, hi all,

On Sun, 18 Nov 2018 at 22:53, Hugh Fisher  wrote:
> I suggest that for the language reference, use the license plate
> or registration analogy to introduce "handle" and after that use
> handle throughout. It's short, distinctive, and either will match
> up with what the programmer already knows or won't clash if
> or when they encounter handles elsewhere.

FWIW, a "handle" is typically something that users of an API store and
pass around, and which can be used to do all operations on some
object.  It is whatever a specific implementation needs to describe
references to an object.  In the CPython C API, this is ``PyObject*``.
I think that using "handle" for something more abstract is just going
to create confusion.

Also FWIW, my own 2 cents on the topic of changing the C API: let's
entirely drop ``PyObject *`` and instead use more opaque
handles---like a ``PyHandle`` that is defined as a pointer-sized C
type but is not actually directly a pointer.  The main difference this
would make is that the user of the API cannot dereference anything
from the opaque handle, nor directly compare handles with each other
to learn about object identity.  They would work exactly like Windows
handles or POSIX file descriptors.  These handles would be returned by
C API calls, and would need to be closed when no longer used.  Several
different handles may refer to the same object, which stays alive for
at least as long as there are open handles to it.  Doing it this way
would untangle the notion of objects from their actual implementation.
In CPython objects would internally use reference counting, a handle
is really just a PyObject pointer in disguise, and closing a handle
decreases the reference counter.  In PyPy we'd have a global table of
"open objects", and a handle would be an index in that table; closing
a handle means writing NULL into that table entry.  No emulated
reference counting needed: we simply use the existing GC to keep alive
objects that are referenced from one or more table entries.  The cost
is limited to a single indirection.

The C API would change a lot, so it's not reasonable to do that in the
CPython repo.  But it could be a third-party project, attempting to
define an API like this and implement it well on top of both CPython
and PyPy.  IMHO this might be a better idea than just changing the API
of functions defined long ago to make them more regular (e.g. stop
returning borrowed references); by now this would mostly mean creating
more work for the PyPy team to track and adapt to the changes, with no
real benefits.


A bientôt,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com