[Python-Dev] Re: PEP: Modify the C API to hide implementation details

2020-04-10 Thread Victor Stinner
Le ven. 10 avr. 2020 à 22:00, Antoine Pitrou  a écrit :
> > Debug runtime and remove debug checks in release mode
> > .
> >
> > If the C extensions are no longer tied to CPython internals, it becomes
> > possible to switch to a Python runtime built in debug mode to enable
> > runtime debug checks to ease debugging C extensions.
>
> That's the one convincing feature in this PEP, as far as I'm concerned.

In fact, I already implemented this feature in Python 3.8:
https://docs.python.org/dev/whatsnew/3.8.html#debug-build-uses-the-same-abi-as-release-build

Feature implemented on most platforms, except on Android, Cygwin and
Windows sadly.

You can now switch between a release build of Python and a debug build
of Python without having to rebuild your C extensions which were
compiled in release mode. If you want, you can use a debug build of
some C extensions.You now have many options: the debug ABI is now
compatible with the release ABI.

This PEP section is mostly a call to remove debug checks in release
mode :-) In my latest attempt, I failed to explain that the debug
build is now easy enough to be used by developers in practice (I never
finished my article explaining how to use it):
https://bugs.python.org/issue37406

Steve: the use case is to debug very rare Python crashes (ex: once
every two months) of customers who fail to provide a reproducer. My
*expectation* is that a debug build should help to reproduce the bug
and/or provide more information when the bug happens. My motivation
for this feature is also to show that the bug is not on Python but in
third-party C extensions ;-)

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/72KHLMKWFUGEI5AEVCD66255JJTZHBDY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP: Modify the C API to hide implementation details

2020-04-10 Thread Victor Stinner
Le ven. 10 avr. 2020 à 22:00, Antoine Pitrou  a écrit :
> How do you keep fast type checking such as PyTuple_Check() if extension
> code doesn't have access e.g. to tp_flags?
>
> I notice you did:
> """
> Add fast inlined version _PyType_HasFeature() and _PyType_IS_GC()
> for object.c and typeobject.c.
> """
>
> So you understand there is a need.

By the way, CPython currently uses statically allocated types for
builtin types like str or list. This may have to change to run
efficiently multiple subinterepters in parallel: each subinterpeter
should have its own heap-allocated type with its own reference
counter.

Using heap allocated types means that PyUnicode_Check() implementation
has to change. It's just another good reason to better hide
PyUnicode_Check() implementation right now ;-)

Victor
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EIEBID2VQKZN6Z3XSAS6LMVHKCIOKHAI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP: Modify the C API to hide implementation details

2020-04-10 Thread Victor Stinner
Le ven. 10 avr. 2020 à 22:00, Antoine Pitrou  a écrit :
> > Examples of issues to make structures opaque:
> >
> > * ``PyGC_Head``: https://bugs.python.org/issue40241
> > * ``PyObject``: https://bugs.python.org/issue39573
> > * ``PyTypeObject``: https://bugs.python.org/issue40170
>
> How do you keep fast type checking such as PyTuple_Check() if extension
> code doesn't have access e.g. to tp_flags?

Hum. I should clarify that we have the choice to not having any impact
on performance for the regular runtime: only use opaque function for
the "new" runtime. It's exactly what is already done with the
Py_LIMITED_API. Concrete example:

static inline int
PyType_HasFeature(PyTypeObject *type, unsigned long feature) {
#ifdef Py_LIMITED_API
return ((PyType_GetFlags(type) & feature) != 0);
#else
return ((type->tp_flags & feature) != 0);
#endif
}

The Py_LIMITED_API goes through PyType_GetFlags() function call,
otherwise PyTypeObject.tp_flags field is accessed directly.

I recently modified this function to:

static inline int
PyType_HasFeature(PyTypeObject *type, unsigned long feature) {
return ((PyType_GetFlags(type) & feature) != 0);
}

I consider that checking a type is not performance critical and so I
chose to have the same implementation for everyone. If someone sees
that it's major performance overhead, we can visit this choice and
reintroduce an #ifdef.

It's more a practical issue about the maintenance of two flavors of
Python in the same code base. Do you want to have two implementations
of each function? Or is it possible to have a single implementation
for some functions?

I suggest to reduce the code duplication and accept a performance
overhead when it's small enough.


> > O(1) bytearray to bytes conversion
> > ..
> >
> > Convert bytearray to bytes without memory copy.
> > (...)
>
> If that's desirable (I'm not sure it is), (...)

Hum, maybe I should clarify the whole "New optimized CPython runtime"
section. The list of optimizations are not optimizations that must be
implemented. There are more examples of optimizations which becomes
possible to implement, or at least easier to implement, once the C API
will be fixed.

I'm not sure that "bytearray to bytes conversion" is performance
bottleneck. It's just that such optimization is easier to explain that
other more complex optimizations ;-)

The intent of this PEP is not to design a faster CPython, but to show
that reworking the C API allows to implement such faster CPython.


> > Fork and "Copy-on-Read" problem
> > ...
> >
> > Solve the "Copy on read" problem with fork: store reference counter
> > outside ``PyObject``.
>
> Nowadays it is strongly recommended to use multiprocessing with the
> "forkserver" start method:
> https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

I understood that the Instagram workload is to load heavy data only
once, and fork later. I 'm not sure that forkserver fits such
workload.


> > One solution for that would be to store reference counters outside
> > ``PyObject``. For example, in a separated hash table (pointer to
> > reference counter). Changing ``PyObject`` structures requires that C
> > extensions don't access them directly.
>
> You're planning to introduce a large overhead for each reference
> count lookup just to satisfy a rather niche use case?  CPython
> probably does millions of reference counts per second.

Sorry, again, I'm not proposing to move ob_refcnt outside PyObject for
everyone. The intent is to show that it becomes possible to do if you
have a very specific use case where it would be more efficient.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JPDCY4BVKZ2USN5MWFWUQ2JVBQWEAYGM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP: Modify the C API to hide implementation details

2020-04-10 Thread Antoine Pitrou
On Fri, 10 Apr 2020 23:33:28 +0100
Steve Dower  wrote:
> On 10Apr2020 2055, Antoine Pitrou wrote:
> > On Fri, 10 Apr 2020 19:20:00 +0200
> > Victor Stinner  wrote:  
> >>
> >> Note: Cython and cffi should be preferred to write new C extensions.
> >> This PEP is about existing C extensions which cannot be rewritten with
> >> Cython.  
> > 
> > Using Cython does not make the C API irrelevant.  In some
> > applications, the C API has to be low-level enough for performance.
> > Whether the application is written in Cython or not.  
> 
> It does to the code author.
> 
> The point here is that we want authors who insist on coding against the 
> C API to be aware that they have fewer compatibility guarantees [...]

Yeah, you missed the point of my comment here.  Cython *does* call into
the C API, and it's quite insistent on performance optimizations too.
Saying "just use Cython" doesn't make the C API unimportant - it just
hides it from your own sight.

> - maybe 
> even to the point of needing to rebuild for each minor version if you 
> want to insist on using macros (i.e. anything starting with "_Py").

If there's still a way for C extensions to get at now-private APIs,
then the PEP fails to convey that, IMHO.

> >> **Backward compatibility:** backward incompatible on purpose. Break the
> >> limited C API and the stable ABI, with the assumption that `Most C
> >> extensions don't rely directly on CPython internals`_ and so will remain
> >> compatible.  
> > 
> > The problem here is not only compatibility but potential performance
> > regressions in C extensions.  
> 
> I don't think we've ever guaranteed performance between releases. 
> Correctness, sure, but not performance.

That's a rather weird argument.  Just because you don't guarantee
performance doesn't mean it's ok to introduce performance regressions.

It's especially a weird argument to make when discussing a PEP where
most of the arguments are distant promises of improved performance.

> >> Fork and "Copy-on-Read" problem
> >> ...
> >>
> >> Solve the "Copy on read" problem with fork: store reference counter
> >> outside ``PyObject``.  
> > 
> > Nowadays it is strongly recommended to use multiprocessing with the
> > "forkserver" start method:
> > https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
> > 
> > With "forkserver", the forked process is extremely lightweight and
> > there are little savings to be made in the child.  
> 
> Unfortunately, a recommendation that only applies to a minority of 
> Python users. Oh well.

Which "minority" are you talking about?  Neither of us has numbers, but
I'm quite sure that the population of Python users calling into
multiprocessing (or a third-party library relying on multiprocessing,
such as Dask) is much larger than the population of Python users
calling fork() directly and relying on copy-on-write for optimization
purposes.

But if you have a different experience to share, please do so.

> Separating refcounts theoretically improves cache locality, specifically 
> the case where cache invalidation impacts multiple CPUs (and even the 
> case where a single thread moves between CPUs).

I'm a bit curious why it would improve, rather than degrade, cache
locality. If you take the typical example of the eval loop, an object
is incref'ed and decref'ed just about the same time that it gets used.

I'll also note that the PEP proposes to remove APIs which return
borrowed references... yet increasing the number of cases where
accessing an object implies updating its refcount.

Therefore I'm unconvinced that stashing refcounts in a separate memory
area would provide any CPU efficiency benefit.

> >> Debug runtime and remove debug checks in release mode
> >> .
> >>
> >> If the C extensions are no longer tied to CPython internals, it becomes
> >> possible to switch to a Python runtime built in debug mode to enable
> >> runtime debug checks to ease debugging C extensions.  
> > 
> > That's the one convincing feature in this PEP, as far as I'm concerned.  
> 
> Eh, this assumes that someone is fully capable of rebuilding CPython and 
> their own extension, but not one of their dependencies, [...]

You don't need to rebuild CPython if someone provides a binary debug
build (which would probably happen if such a build were compatible with
regular packages).  You also don't need to rebuild your own extension
to take advantage of the interpreter's internal correctness checks, if
the interpreter's ABI hasn't changed.

This is the whole point: being able to load an unmodified extension
(and unmodified dependencies) on a debug-checks-enabled interpreter.

> and this code 
> that they're using doesn't have any system dependencies that differ in 
> debug builds (spoiler: they do).

Are you talking about Windows?  On non-Windows systems, I don't think
there are "system dependencies that differ in debug builds".

Regards


[Python-Dev] Re: PEP: Modify the C API to hide implementation details

2020-04-10 Thread Steve Dower

On 10Apr2020 2055, Antoine Pitrou wrote:

On Fri, 10 Apr 2020 19:20:00 +0200
Victor Stinner  wrote:


Note: Cython and cffi should be preferred to write new C extensions.
This PEP is about existing C extensions which cannot be rewritten with
Cython.


Using Cython does not make the C API irrelevant.  In some
applications, the C API has to be low-level enough for performance.
Whether the application is written in Cython or not.


It does to the code author.

The point here is that we want authors who insist on coding against the 
C API to be aware that they have fewer compatibility guarantees - maybe 
even to the point of needing to rebuild for each minor version if you 
want to insist on using macros (i.e. anything starting with "_Py").



Examples of issues to make structures opaque:

* ``PyGC_Head``: https://bugs.python.org/issue40241
* ``PyObject``: https://bugs.python.org/issue39573
* ``PyTypeObject``: https://bugs.python.org/issue40170


How do you keep fast type checking such as PyTuple_Check() if extension
code doesn't have access e.g. to tp_flags?


Measured in isolation, sure. But what task are you doing that is being 
held up by builtin type checks?


If the type check is the bottleneck, you need to work on more 
interesting algorithms ;)



I notice you did:
"""
Add fast inlined version _PyType_HasFeature() and _PyType_IS_GC()
for object.c and typeobject.c.
"""

So you understand there is a need.


These are private APIs.


**Backward compatibility:** backward incompatible on purpose. Break the
limited C API and the stable ABI, with the assumption that `Most C
extensions don't rely directly on CPython internals`_ and so will remain
compatible.


The problem here is not only compatibility but potential performance
regressions in C extensions.


I don't think we've ever guaranteed performance between releases. 
Correctness, sure, but not performance.



New optimized CPython runtime
==

Backward incompatible changes is such a pain for the whole Python
community. To ease the migration (accelerate adoption of the new C
API), one option is to provide not only one but two CPython runtimes:

* Regular CPython: fully backward compatible, support direct access to
   structures like ``PyObject``, etc.
* New optimized CPython: incompatible, cannot import C extensions which
   don't use the limited C API, has new optimizations, limited to the C
   API.


Well, this sounds like a distribution nightmare.  Some packages will
only be available for one runtime and not the other.  It will confuse
non-expert users.


Agreed (except that it will also confuse expert users). Doing "Python 
4"-by-stealth like this is a terrible idea.


If it's incompatible, give it a new version number. If you don't want a 
new version number, maintain compatibility. There are no alternatives.



O(1) bytearray to bytes conversion
..

Convert bytearray to bytes without memory copy.

Currently, bytearray is used to build a bytes string, but it's usually
converted into a bytes object to respect an API. This conversion
requires to allocate a new memory block and copy data (O(n) complexity).

It is possible to implement O(1) conversion if it would be possible to
pass the ownership of the bytearray object to bytes.

That requires modifying the ``PyBytesObject`` structure to support
multiple storages (support storing content into a separate memory
block).


If that's desirable (I'm not sure it is), there is a simpler solution:
instead of allocating a raw memory area, bytearray could allocate... a
private bytes object that you can detach without copying it.


Yeah, I don't see the point in this one, unless you mean a purely 
internal change. Is this a major bottleneck?


Having a broader concept of "freezable" objects may be a valuable thing 
to enable in a new runtime, but retrofitting it to CPython doesn't seem 
likely to have a big impact.



Fork and "Copy-on-Read" problem
...

Solve the "Copy on read" problem with fork: store reference counter
outside ``PyObject``.


Nowadays it is strongly recommended to use multiprocessing with the
"forkserver" start method:
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

With "forkserver", the forked process is extremely lightweight and
there are little savings to be made in the child.


Unfortunately, a recommendation that only applies to a minority of 
Python users. Oh well.


Separating refcounts theoretically improves cache locality, specifically 
the case where cache invalidation impacts multiple CPUs (and even the 
case where a single thread moves between CPUs). But I don't think 
there's been a convincing real benchmark of this yet.



Debug runtime and remove debug checks in release mode
.

If the C extensions are no longer tied to CPython internals, it becomes
possible to switch to a Python runtime built in debug mode 

[Python-Dev] Re: PEP: Modify the C API to hide implementation details

2020-04-10 Thread Antoine Pitrou
On Fri, 10 Apr 2020 19:20:00 +0200
Victor Stinner  wrote:
> 
> Note: Cython and cffi should be preferred to write new C extensions.
> This PEP is about existing C extensions which cannot be rewritten with
> Cython.

Using Cython does not make the C API irrelevant.  In some
applications, the C API has to be low-level enough for performance.
Whether the application is written in Cython or not.

> **Status:** not started. The performance overhead must be measured with
> benchmarks and this PEP should be accepted.

Surely you mean "before this PEP should be accepted"?

> Examples of issues to make structures opaque:
> 
> * ``PyGC_Head``: https://bugs.python.org/issue40241
> * ``PyObject``: https://bugs.python.org/issue39573
> * ``PyTypeObject``: https://bugs.python.org/issue40170

How do you keep fast type checking such as PyTuple_Check() if extension
code doesn't have access e.g. to tp_flags?

I notice you did:
"""
Add fast inlined version _PyType_HasFeature() and _PyType_IS_GC()
for object.c and typeobject.c.
"""

So you understand there is a need.

> **Backward compatibility:** backward incompatible on purpose. Break the
> limited C API and the stable ABI, with the assumption that `Most C
> extensions don't rely directly on CPython internals`_ and so will remain
> compatible.

The problem here is not only compatibility but potential performance
regressions in C extensions.

> New optimized CPython runtime
> ==
> 
> Backward incompatible changes is such a pain for the whole Python
> community. To ease the migration (accelerate adoption of the new C
> API), one option is to provide not only one but two CPython runtimes:
> 
> * Regular CPython: fully backward compatible, support direct access to
>   structures like ``PyObject``, etc.
> * New optimized CPython: incompatible, cannot import C extensions which
>   don't use the limited C API, has new optimizations, limited to the C
>   API.

Well, this sounds like a distribution nightmare.  Some packages will
only be available for one runtime and not the other.  It will confuse
non-expert users.

> O(1) bytearray to bytes conversion
> ..
> 
> Convert bytearray to bytes without memory copy.
> 
> Currently, bytearray is used to build a bytes string, but it's usually
> converted into a bytes object to respect an API. This conversion
> requires to allocate a new memory block and copy data (O(n) complexity).
> 
> It is possible to implement O(1) conversion if it would be possible to
> pass the ownership of the bytearray object to bytes.
> 
> That requires modifying the ``PyBytesObject`` structure to support
> multiple storages (support storing content into a separate memory
> block).

If that's desirable (I'm not sure it is), there is a simpler solution:
instead of allocating a raw memory area, bytearray could allocate... a
private bytes object that you can detach without copying it.

But really, this is why we have BytesIO.  Which already uses that exact
strategy: allocate a private bytes object.

> Fork and "Copy-on-Read" problem
> ...
> 
> Solve the "Copy on read" problem with fork: store reference counter
> outside ``PyObject``.

Nowadays it is strongly recommended to use multiprocessing with the
"forkserver" start method:
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

With "forkserver", the forked process is extremely lightweight and
there are little savings to be made in the child.

> `Dismissing Python Garbage Collection at Instagram
> `_
> (Jan 2017) by Instagram Engineering.
> 
> Instagram contributed `gc.freeze()
> `_ to Python 3.7
> which works around the issue.
> 
> One solution for that would be to store reference counters outside
> ``PyObject``. For example, in a separated hash table (pointer to
> reference counter). Changing ``PyObject`` structures requires that C
> extensions don't access them directly.

You're planning to introduce a large overhead for each reference
count lookup just to satisfy a rather niche use case?  CPython
probably does millions of reference counts per second.

> Debug runtime and remove debug checks in release mode
> .
> 
> If the C extensions are no longer tied to CPython internals, it becomes
> possible to switch to a Python runtime built in debug mode to enable
> runtime debug checks to ease debugging C extensions.

That's the one convincing feature in this PEP, as far as I'm concerned.

Regards

Antoine.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 

[Python-Dev] Summary of Python tracker Issues

2020-04-10 Thread Python tracker

ACTIVITY SUMMARY (2020-04-03 - 2020-04-10)
Python tracker at https://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open7433 (+32)
  closed 44563 (+37)
  total  51996 (+69)

Open issues with patches: 2934 


Issues opened (50)
==

#39984: Move pending calls from _PyRuntime to PyInterpreterState
https://bugs.python.org/issue39984  reopened by vstinner

#40178: Convert the remaining os funtions to Argument Clinic
https://bugs.python.org/issue40178  opened by serhiy.storchaka

#40179: Argument Clinic incorretly translates #elif
https://bugs.python.org/issue40179  opened by serhiy.storchaka

#40180: isinstance(cls_with_metaclass, non_type) raises KeyError
https://bugs.python.org/issue40180  opened by terry.reedy

#40181: IDLE: remove positional-only note from calltips
https://bugs.python.org/issue40181  opened by terry.reedy

#40183: AC_COMPILE_IFELSE doesn't work in all cases
https://bugs.python.org/issue40183  opened by jerome.hamm

#40186: test_notify_all hangs forever in sparc64
https://bugs.python.org/issue40186  opened by BTaskaya

#40188: Azure Pipelines jobs failing randomly with: Unable to connect 
https://bugs.python.org/issue40188  opened by vstinner

#40191: tempfile.mkstemp() | Documentation Error
https://bugs.python.org/issue40191  opened by Howard Waterfall

#40192: time.thread_time isn't outputting in nanoseconds in AIX
https://bugs.python.org/issue40192  opened by BTaskaya

#40195: multiprocessing.Queue.put can fail silently due to pickle erro
https://bugs.python.org/issue40195  opened by Sander Land

#40197: Add nanoseconds to timing table in What's new python 3.8
https://bugs.python.org/issue40197  opened by mchels

#40198: macOS Python builds from Python.org ignore DYLD_LIBRARY_PATH d
https://bugs.python.org/issue40198  opened by dgelessus

#40199: Invalid escape sequence DeprecationWarnings don't trigger by d
https://bugs.python.org/issue40199  opened by Numerlor

#40202: Misleading grammatically of ValueError Message?
https://bugs.python.org/issue40202  opened by Jacob RR

#40203: Warn about invalid PYTHONUSERBASE
https://bugs.python.org/issue40203  opened by Volker Weißmann

#40204: Docs build error with Sphinx 3.0 due to invalid C declaration
https://bugs.python.org/issue40204  opened by xtreak

#40205: Profile 'builtins' parameter documentation missing
https://bugs.python.org/issue40205  opened by bar.harel

#40207: Expose NCURSES_EXT_FUNCS under curses
https://bugs.python.org/issue40207  opened by BTaskaya

#40208: Remove deprecated SymbolTable.has_exec
https://bugs.python.org/issue40208  opened by BTaskaya

#40209: read_pyfile function refactor in Lib/test/test_unparse.py
https://bugs.python.org/issue40209  opened by hakancelik

#40210: ttk.Combobox focus-out event inheritage
https://bugs.python.org/issue40210  opened by Nikolai Ehrhardt

#40211: Clarify preadv and pwritev is supported AIX 7.1 and newer.
https://bugs.python.org/issue40211  opened by BTaskaya

#40212: Re-enable posix_fallocate and posix_fadvise on AIX
https://bugs.python.org/issue40212  opened by BTaskaya

#40213: contextlib.aclosing()
https://bugs.python.org/issue40213  opened by John Belmonte

#40214: test_ctypes.test_load_dll_with_flags Windows failure
https://bugs.python.org/issue40214  opened by aeros

#40215: Use xdg basedir spec on linux
https://bugs.python.org/issue40215  opened by Óscar García Amor

#40217: The garbage collector doesn't take in account that objects of 
https://bugs.python.org/issue40217  opened by vstinner

#40218: sys.executable is a non existing path if python is executed fr
https://bugs.python.org/issue40218  opened by Volker Weißmann

#40219: ttk LabeledScale: label covered by hidden element
https://bugs.python.org/issue40219  opened by Stephen Bell

#40221: Use new _at_fork_reinit() lock method in multiprocessing
https://bugs.python.org/issue40221  opened by vstinner

#40222: "Zero cost" exception handling
https://bugs.python.org/issue40222  opened by Mark.Shannon

#40223: Add -fwrapv for new icc versions
https://bugs.python.org/issue40223  opened by skrah

#40225: recursive call within generator expression is O(depth)
https://bugs.python.org/issue40225  opened by brendon-zh...@hotmail.com

#40227: SSLError is not passed to the client during handshake
https://bugs.python.org/issue40227  opened by iivanyuk

#40228: Make setting line number in frame more robust.
https://bugs.python.org/issue40228  opened by Mark.Shannon

#40229: tty unblocking setraw and save-restore features
https://bugs.python.org/issue40229  opened by Steven Lu

#40230: Itertools.product() Out of Memory Errors
https://bugs.python.org/issue40230  opened by Henry Carscadden

#40231: Fix pending calls in subinterpreters
https://bugs.python.org/issue40231  opened by vstinner

#40232: PyOS_AfterFork_Child() should use _PyThread_at_fork_reinit()
https://bugs.python.org/issue40232  opened by 

[Python-Dev] PEP: Modify the C API to hide implementation details

2020-04-10 Thread Victor Stinner
Hi,

Here is a first draft a PEP which summarize the research work I'm
doing on CPython C API since 2017 and the changes that me and others
already made since Python 3.7 towards an "opaque" C API. The PEP is
also a collaboration with developers of PyPy, HPy, Rust-CPython and
many others! Thanks to everyone who helped me to write it down!

Maybe this big document should be reorganized as multiple smaller
better defined goals: as multiple PEPs. The PEP is quite long and
talks about things which are not directly related. It's a complex
topic and I chose to put everything as a single document to have a
good starting point to open the discussion. I already proposed some of
these ideas in 2017: see the Prior Art section ;-)

The PEP can be read on GitHub where it's better formatted:
https://github.com/vstinner/misc/blob/master/cpython/pep-opaque-c-api.rst

If someone wants to work on the PEP itself, the document on GitHub is
the current reference.

Victor



PEP xxx: Modify the C API to hide implementation details


Abstract


* Hide implementation details from the C API to be able to `optimize
  CPython`_ and make PyPy more efficient.
* The expectation is that `most C extensions don't rely directly on
  CPython internals`_ and so will remain compatible.
* Continue to support old unmodified C extensions by continuing to
  provide the fully compatible "regular" CPython runtime.
* Provide a `new optimized CPython runtime`_ using the same CPython code
  base: faster but can only import C extensions which don't use
  implementation details. Since both CPython runtimes share the same
  code base, features implemented in CPython will be available in both
  runtimes.
* `Stable ABI`_: Only build a C extension once and use it on multiple
  Python runtimes and different versions of the same runtime.
* Better advertise alternative Python runtimes and better communicate on
  the differences between the Python language and the Python
  implementation (especially CPython).

Note: Cython and cffi should be preferred to write new C extensions.
This PEP is about existing C extensions which cannot be rewritten with
Cython.


Rationale
=

To remain competitive in term of performance with other programming
languages like Go or Rust, Python has to become more efficient.

Make Python (at least) two times faster
---

The C API leaks too many implementation details which prevent optimizing
CPython. See `Optimize CPython`_.

PyPy's support for Python's C API (pycext) is slow because it has to
emulate CPython internals like memory layout and reference counting. The
emulation causes memory overhead, memory copies, conversions, etc. See
`Inside cpyext: Why emulating CPython C API is so Hard
`_
(Sept 2018) by Antonio Cuni.

While this PEP may make CPython a little bit slower in the short term,
the long-term goal is to make "Python" at least two times faster. This
goal is not hypothetical: PyPy is already 4.2x faster than CPython and is
fully compatible. C extensions are the bottleneck of PyPy. This PEP
proposes a migration plan to move towards opaque C API which would make
PyPy faster.

Separated the Python language and the CPython runtime (promote
alternative runtimes)


The Python language should be better separated from its runtime. It's
common to say "Python" when referring to "CPython". Even in this PEP :-)

Because the CPython runtime remains the reference implementation, many
people believe that the Python language itself has design flaws which
prevent it from being efficient. PyPy proved that this is a false
assumption: on average, PyPy runs Python code 4.2 times faster than
CPython.

One solution for separating the language from the implementation is to
promote the usage of alternative runtimes: not only provide the regular
CPython, but also PyPy, optimized CPython which is only compatible with
C extensions using the limited C API, CPython compiled in debug mode to
ease debugging issues in C extensions, RustPython, etc.

To make alternative runtimes viable, they should be competitive in term
of features and performance. Currently, C extension modules remain the
bottleneck for PyPy.

Most C extensions don't rely directly on CPython internals
--

While the C API is still tidely coupled to CPython internals, in
practical, most C extensions don't rely directly on CPython internals.

The expectation is that these C extensions will remain compatible with
an "opaque" C API and only a minority of C extensions will have to be
modified.

Moreover, more and more C extensions are implemented in Cython or cffi.
Updating Cython and cffi to be compatible with the