STINNER Victor <vstin...@redhat.com> added the comment:

> It seems that the disagreement about the design is fundamentally a 
> disagreement between a "quick, painful but complete fix" and "slow, careful 
> improvements with a transition period". Both are valid approaches, and since 
> Victor is putting actual effort in right now he gets to "win", but I do think 
> we can afford to move faster.

Technically, the API already exists and is exposed as a private API:

* "_PyCoreConfig" structure
* "_PyInitError _Py_InitializeFromConfig(const _PyCoreConfig *config)" function
* "void _Py_FatalInitError(_PyInitError err)" function (should be called on 
failure)

I'm not really sure of the benefit compared to the current initialization API 
using Py_xxx global configuration variables (ex: Py_IgnoreEnvironmentFlag) and 
Py_Initialize().

_PyCoreConfig basically exposed *all* input parameters used to initialize 
Python, much more than jsut global configuration variables and the few function 
that can be called before Py_Initialize():
https://docs.python.org/dev/c-api/init.html


> Currently PEP 432 is the best description we have, and it looks like Victor 
> has been heading in that direction too (deliberately? I don't know :) ).

Well, it's a strange story. At the beginning, I had a very simple use case... 
it took me more or less one year to implement it :-) My use case was to add... 
a new -X utf8 command line option:

* parsing the command line requires to decode bytes using an encoding
* the encoding depends on the locale, environment variable and options on the 
command line
* environment variables depend on the command line (-E option)

If the utf8 mode is enabled (PEP 540), the encoding must be set to UTF-8, all 
configuration must be removed and the whole configuration (env vars, cmdline, 
etc.) must be read again from scratch :-)

To be able to do that, I had to collect *every single* thing which has an 
impact on the Python initialization: all things that I moved into _PyCoreConfig.

... but I didn't want to break the backward compatibility, so I had to keep 
support for Py_xxx global configuration variables... and also the few 
initialization functions like Py_SetPath() or Py_SetStandardStreamEncoding().

Later it becomes very dark, my goal became very unclear and I looked at the PEP 
432 :-)

Well, I wanted to expose _PyCoreConfig somehow, so I looked at the PEP 432 to 
see how it can be exposed.


> By necessity, it touches a lot of people's contributions to Python, but it 
> also has the potential to seriously improve even more people's ability to 
> _use_ Python (for example, I know an app that you all would recognize the 
> name of who is working on embedding Python right now and would _love_ certain 
> parts of this side of things to be improved).

_PyCoreConfig "API" makes some things way simpler. Maybe it was already 
possible to do them previously but it was really hard, or maybe it was just not 
possible.

If a _PyCoreConfig field is set: it has the priority over any other way to 
initialize the field. _PyCoreConfig has the highest prioririty.

For example, _PyCoreConfig allows to completely ignore the code which computes 
sys.path (and related variables) by setting directly the "path configuration":

* nmodule_search_path, module_search_paths: list of sys.path paths
* executable: sys.executable */
* prefix: sys.prefix
* base_prefix: sys.base_prefix
* exec_prefix: sys.exec_prefix
* base_exec_prefix sys.base_exec_prefix
* (Windows only) dll_path: Windows DLL path

The code which initializes these fields is really complex. Without 
_PyCoreConfig, it's hard to make sure that these fields are properly 
initialized as an embedder would like.




> Nick, Victor, Eric, (others?) - are you interested in having a virtual 
> whiteboard session to brainstorm how the "perfect" initialization looks? And 
> probably a follow-up to brainstorm how to get there without breaking the 
> world? I don't think we're going to get to be in the same room anytime before 
> the language summit, and it would be awesome to have something concrete to 
> discuss there.

Sorry, I'm not sure of the API / structures, but when I discussed with Eric 
Snow at the latest sprint, we identified different steps in the Python 
initialization:

* only use bytes (no encoding), no access to the filesystem (not needed at this 
point)
* encoding defined, can use Unicode
* use the filesystem
* configuration converted as Python objects
* Python is fully initialized

--

Once I experimented to reorganize _PyCoreConfig and _PyMainInterpreterConfig to 
avoid redundancy: add a _PyPreConfig which contains only fields which are 
needed before _PyMainInterpreterConfig. With that change, 
_PyMainInterpreterConfig (and _PyPreConfig) *contained* _PyCoreConfig.

But it the change became very large, I wasn't sure that it was a good idea, I 
abandonned my change.

* https://github.com/python/cpython/pull/10575
* https://bugs.python.org/issue35266
* I have a more advanced version in this branch of my fork: 
https://github.com/vstinner/cpython/commits/pre_config_next

--

Ok, something else. _PyCoreConfig (and _PyMainInterpreterConfig) contain memory 
allocated on the heap. Problem: Python initialization changes the memory 
allocator. Code using _PyCoreConfig requires some "tricks" to ensure that the 
memory is *freed* with the same allocator used to *allocate* memory.

I created bpo-35265 "Internal C API: pass the memory allocator in a context" to 
pass a "context" to a lot of functions, context which contains the memory 
allocator but can contain more things later.

The idea of "a context" came during the discussion about a new C API: stop to 
rely on any global variable or "shared state", but *explicitly* pass a context 
to all functions. With that, it becomes possible to imagine to have two 
interpreters running in the same threads "at the same time".

Honestly, I'm not really sure that it's fully possible to implement this 
idea... Python has *so many* "shared state", like *everywhere*. It's really a 
giant project to move these shared states into structures and pass pointers to 
these structures.

So again, I abandonned my experimental change:
https://github.com/python/cpython/pull/10574

--

Memory allocator, context, different structures for configuration... it's 
really not an easy topic :-( There are so many constraints put into a single 
API!

The conservation option at this point is to keep the API private.

... Maybe we can explain how to use the private API but very explicitly warn 
that this API is experimental and can be broken anytime... And I plan to break 
it, to avoid redundancy between core and main configuration for example.

... I hope that these explanations give you a better idea of the big picture 
and the challenges :-)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue22213>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to