Hi all,

I was hoping to get some feedback on a proposed refactoring of the
datetime module that should dramatically improve import performance.

The datetime module is implemented more or less in full both in pure
Python and in C; the way that this is currently achieved
<https://github.com/python/cpython/blob/5a2bac7fe0e7a2b67fd57c7a9176a50feed0d7a0/Lib/datetime.py#L7>
is that the pure Python implementation is defined in datetime.py, and
the C implementation is in _datetime, and /after/ the full Python
version is defined, the C version is star-imported and thus any symbols
defined in both versions are taken from the C version; if the C version
is used, any private symbols used only in the pure Python implementation
are manually deleted (see the end of the file
<https://github.com/python/cpython/blob/5a2bac7fe0e7a2b67fd57c7a9176a50feed0d7a0/Lib/datetime.py#L2503-L2522>).

This adds a lot of unnecessary overhead, both to define a bunch of
unused classes and functions and to import modules that are required for
the pure Python implementation but not for the C implementation. In the
issue he created about this <https://bugs.python.org/issue40799>, Victor
Stinner demonstrated that moving the pure Python implementation to its
own module would speed up the import of datetime by a factor of 4.

I think that we should indeed move the pure Python implementation into
its own module, despite the fact that this is almost guaranteed to break
some people either relying on implementation details or doing something
funky with the import system — I don't think it should break anyone
relying on the guaranteed public interface. The issue at hand is that we
have two options available for the refactoring: either move the pure
Python implementation to its own private top-level module (single file)
such as `_pydatetime`, or make `datetime` a folder with an `__init__.py`
and move the pure Python implementation to `datetime._pydatetime` or
something of that nature.

The decimal and zoneinfo modules both have this same issue; the decimal
module uses the first strategy with _pydecimal and decimal, the zoneinfo
module uses a folder with a zoneinfo._zoneinfo submodule. Assuming we go
forward with this, we need to decide which strategy to adopt for datetime.

In favor of using a datetime/ folder, I'd say it's cleaner to put the
pure Python implementation of datetime under the datetime namespace, and
also it gives us more freedom to play with the module's structure in the
future, since we could have lazily-imported sub-components, or we could
implement some logic common to both implementations in Python and import
it from a `datetime._common` module without requiring the C version to
import the entire Python version, similar to the way zoneinfo has the
zoneinfo._common
<https://github.com/python/cpython/blob/master/Lib/zoneinfo/_common.py>
module.

The downside of the folder method is that it complicates the way
datetime is imported — /especially/ if we add additional structure to
the module, or add any logic into the __init__.py. Two single-file
modules side-by-side, one imported by the other doesn't change anything
about the nature of how the datetime module is imported, and is much
less likely to break anything.

Anyone have thoughts or strong preferences here? Anyone have use cases
where one or the other approaches is likely to cause a bunch of undue
hardship? I'd like to avoid moving this more than once.

Best,
Paul

P.S. Victor's PR moving this code to _pydatetime
<https://github.com/python/cpython/pull/20472> is currently done in such
a way that the ability to backport changes from post-refactoring to
pre-refactoring branches is preserved; I have not checked but I /think/
we should be able to do the same thing with the other strategy as well.

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CCI7PDAL6G67XVVRKPP2FAYJ5YZYHTK3/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to