On Mon, 16 May 2016 at 13:12 Guido van Rossum <gu...@python.org> wrote:
> Once you assign yourself a PEP number I'll do one more pass and then I > expect to accept it -- the draft looks good to me! > Done: https://hg.python.org/peps/rev/b41cb718054a > > On Mon, May 16, 2016 at 1:00 PM, Brett Cannon <br...@python.org> wrote: > >> Recent discussions have been about type hints which are orthogonal to the >> PEP, so things have seemed to have reached a steady state. >> >> Was there anything else that needed clarification, Guido, or are you >> ready to pronounce? Or did you want to wait until the language summit? Or >> did you want to assign a BDFL delegate? >> >> >> On Fri, 13 May 2016 at 11:37 Brett Cannon <br...@python.org> wrote: >> >>> Biggest changes since the second draft: >>> >>> 1. Resolve __fspath__() from the type, not the instance (for Guido) >>> 2. Updated the TypeError messages to say "os.PathLike object" >>> instead of "path object" (implicitly for Steven) >>> 3. TODO item to define "path-like" in the glossary (for Steven) >>> 4. Various more things added to Rejected Ideas >>> 5. Added Koos as a co-author (for Koos :) >>> >>> ---------- >>> PEP: NNN >>> Title: Adding a file system path protocol >>> Version: $Revision$ >>> Last-Modified: $Date$ >>> Author: Brett Cannon <br...@python.org>, >>> Koos Zevenhoven <k7ho...@gmail.com> >>> Status: Draft >>> Type: Standards Track >>> Content-Type: text/x-rst >>> Created: 11-May-2016 >>> Post-History: 11-May-2016, >>> 12-May-2016, >>> 13-May-2016 >>> >>> >>> Abstract >>> ======== >>> >>> This PEP proposes a protocol for classes which represent a file system >>> path to be able to provide a ``str`` or ``bytes`` representation. >>> Changes to Python's standard library are also proposed to utilize this >>> protocol where appropriate to facilitate the use of path objects where >>> historically only ``str`` and/or ``bytes`` file system paths are >>> accepted. The goal is to facilitate the migration of users towards >>> rich path objects while providing an easy way to work with code >>> expecting ``str`` or ``bytes``. >>> >>> >>> Rationale >>> ========= >>> >>> Historically in Python, file system paths have been represented as >>> strings or bytes. This choice of representation has stemmed from C's >>> own decision to represent file system paths as >>> ``const char *`` [#libc-open]_. While that is a totally serviceable >>> format to use for file system paths, it's not necessarily optimal. At >>> issue is the fact that while all file system paths can be represented >>> as strings or bytes, not all strings or bytes represent a file system >>> path. This can lead to issues where any e.g. string duck-types to a >>> file system path whether it actually represents a path or not. >>> >>> To help elevate the representation of file system paths from their >>> representation as strings and bytes to a richer object representation, >>> the pathlib module [#pathlib]_ was provisionally introduced in >>> Python 3.4 through PEP 428. While considered by some as an improvement >>> over strings and bytes for file system paths, it has suffered from a >>> lack of adoption. Typically the key issue listed for the low adoption >>> rate has been the lack of support in the standard library. This lack >>> of support required users of pathlib to manually convert path objects >>> to strings by calling ``str(path)`` which many found error-prone. >>> >>> One issue in converting path objects to strings comes from >>> the fact that the only generic way to get a string representation of >>> the path was to pass the object to ``str()``. This can pose a >>> problem when done blindly as nearly all Python objects have some >>> string representation whether they are a path or not, e.g. >>> ``str(None)`` will give a result that >>> ``builtins.open()`` [#builtins-open]_ will happily use to create a new >>> file. >>> >>> Exacerbating this whole situation is the >>> ``DirEntry`` object [#os-direntry]_. While path objects have a >>> representation that can be extracted using ``str()``, ``DirEntry`` >>> objects expose a ``path`` attribute instead. Having no common >>> interface between path objects, ``DirEntry``, and any other >>> third-party path library has become an issue. A solution that allows >>> any path-representing object to declare that it is a path and a way >>> to extract a low-level representation that all path objects could >>> support is desired. >>> >>> This PEP then proposes to introduce a new protocol to be followed by >>> objects which represent file system paths. Providing a protocol allows >>> for explicit signaling of what objects represent file system paths as >>> well as a way to extract a lower-level representation that can be used >>> with older APIs which only support strings or bytes. >>> >>> Discussions regarding path objects that led to this PEP can be found >>> in multiple threads on the python-ideas mailing list archive >>> [#python-ideas-archive]_ for the months of March and April 2016 and on >>> the python-dev mailing list archives [#python-dev-archive]_ during >>> April 2016. >>> >>> >>> Proposal >>> ======== >>> >>> This proposal is split into two parts. One part is the proposal of a >>> protocol for objects to declare and provide support for exposing a >>> file system path representation. The other part deals with changes to >>> Python's standard library to support the new protocol. These changes >>> will also lead to the pathlib module dropping its provisional status. >>> >>> Protocol >>> -------- >>> >>> The following abstract base class defines the protocol for an object >>> to be considered a path object:: >>> >>> import abc >>> import typing as t >>> >>> >>> class PathLike(abc.ABC): >>> >>> """Abstract base class for implementing the file system path >>> protocol.""" >>> >>> @abc.abstractmethod >>> def __fspath__(self) -> t.Union[str, bytes]: >>> """Return the file system path representation of the >>> object.""" >>> raise NotImplementedError >>> >>> >>> Objects representing file system paths will implement the >>> ``__fspath__()`` method which will return the ``str`` or ``bytes`` >>> representation of the path. The ``str`` representation is the >>> preferred low-level path representation as it is human-readable and >>> what people historically represent paths as. >>> >>> >>> Standard library changes >>> ------------------------ >>> >>> It is expected that most APIs in Python's standard library that >>> currently accept a file system path will be updated appropriately to >>> accept path objects (whether that requires code or simply an update >>> to documentation will vary). The modules mentioned below, though, >>> deserve specific details as they have either fundamental changes that >>> empower the ability to use path objects, or entail additions/removal >>> of APIs. >>> >>> >>> builtins >>> '''''''' >>> >>> ``open()`` [#builtins-open]_ will be updated to accept path objects as >>> well as continue to accept ``str`` and ``bytes``. >>> >>> >>> os >>> ''' >>> >>> The ``fspath()`` function will be added with the following semantics:: >>> >>> import typing as t >>> >>> >>> def fspath(path: t.Union[PathLike, str, bytes]) -> t.Union[str, >>> bytes]: >>> """Return the string representation of the path. >>> >>> If str or bytes is passed in, it is returned unchanged. >>> """ >>> if isinstance(path, (str, bytes)): >>> return path >>> >>> # Work from the object's type to match method resolution of >>> other magic >>> # methods. >>> path_type = type(path) >>> try: >>> return path_type.__fspath__(path) >>> except AttributeError: >>> if hasattr(path_type, '__fspath__'): >>> raise >>> >>> raise TypeError("expected str, bytes or os.PathLike object, >>> not " >>> + path_type.__name__) >>> >>> The ``os.fsencode()`` [#os-fsencode]_ and >>> ``os.fsdecode()`` [#os-fsdecode]_ functions will be updated to accept >>> path objects. As both functions coerce their arguments to >>> ``bytes`` and ``str``, respectively, they will be updated to call >>> ``__fspath__()`` if present to convert the path object to a ``str`` or >>> ``bytes`` representation, and then perform their appropriate >>> coercion operations as if the return value from ``__fspath__()`` had >>> been the original argument to the coercion function in question. >>> >>> The addition of ``os.fspath()``, the updates to >>> ``os.fsencode()``/``os.fsdecode()``, and the current semantics of >>> ``pathlib.PurePath`` provide the semantics necessary to >>> get the path representation one prefers. For a path object, >>> ``pathlib.PurePath``/``Path`` can be used. To obtain the ``str`` or >>> ``bytes`` representation without any coersion, then ``os.fspath()`` >>> can be used. If a ``str`` is desired and the encoding of ``bytes`` >>> should be assumed to be the default file system encoding, then >>> ``os.fsdecode()`` should be used. If a ``bytes`` representation is >>> desired and any strings should be encoded using the default file >>> system encoding, then ``os.fsencode()`` is used. This PEP recommends >>> using path objects when possible and falling back to string paths as >>> necessary and using ``bytes`` as a last resort. >>> >>> Another way to view this is as a hierarchy of file system path >>> representations (highest- to lowest-level): path → str → bytes. The >>> functions and classes under discussion can all accept objects on the >>> same level of the hierarchy, but they vary in whether they promote or >>> demote objects to another level. The ``pathlib.PurePath`` class can >>> promote a ``str`` to a path object. The ``os.fspath()`` function can >>> demote a path object to a ``str`` or ``bytes`` instance, depending >>> on what ``__fspath__()`` returns. >>> The ``os.fsdecode()`` function will demote a path object to >>> a string or promote a ``bytes`` object to a ``str``. The >>> ``os.fsencode()`` function will demote a path or string object to >>> ``bytes``. There is no function that provides a way to demote a path >>> object directly to ``bytes`` while bypassing string demotion. >>> >>> The ``DirEntry`` object [#os-direntry]_ will gain an ``__fspath__()`` >>> method. It will return the same value as currently found on the >>> ``path`` attribute of ``DirEntry`` instances. >>> >>> The Protocol_ ABC will be added to the ``os`` module under the name >>> ``os.PathLike``. >>> >>> >>> os.path >>> ''''''' >>> >>> The various path-manipulation functions of ``os.path`` [#os-path]_ >>> will be updated to accept path objects. For polymorphic functions that >>> accept both bytes and strings, they will be updated to simply use >>> ``os.fspath()``. >>> >>> During the discussions leading up to this PEP it was suggested that >>> ``os.path`` not be updated using an "explicit is better than implicit" >>> argument. The thinking was that since ``__fspath__()`` is polymorphic >>> itself it may be better to have code working with ``os.path`` extract >>> the path representation from path objects explicitly. There is also >>> the consideration that adding support this deep into the low-level OS >>> APIs will lead to code magically supporting path objects without >>> requiring any documentation updated, leading to potential complaints >>> when it doesn't work, unbeknownst to the project author. >>> >>> But it is the view of this PEP that "practicality beats purity" in >>> this instance. To help facilitate the transition to supporting path >>> objects, it is better to make the transition as easy as possible than >>> to worry about unexpected/undocumented duck typing support for >>> path objects by projects. >>> >>> There has also been the suggestion that ``os.path`` functions could be >>> used in a tight loop and the overhead of checking or calling >>> ``__fspath__()`` would be too costly. In this scenario only >>> path-consuming APIs would be directly updated and path-manipulating >>> APIs like the ones in ``os.path`` would go unmodified. This would >>> require library authors to update their code to support path objects >>> if they performed any path manipulations, but if the library code >>> passed the path straight through then the library wouldn't need to be >>> updated. It is the view of this PEP and Guido, though, that this is an >>> unnecessary worry and that performance will still be acceptable. >>> >>> >>> pathlib >>> ''''''' >>> >>> The constructor for ``pathlib.PurePath`` and ``pathlib.Path`` will be >>> updated to accept ``PathLike`` objects. Both ``PurePath`` and ``Path`` >>> will continue to not accept ``bytes`` path representations, and so if >>> ``__fspath__()`` returns ``bytes`` it will raise an exception. >>> >>> The ``path`` attribute will be removed as this PEP makes it >>> redundant (it has not been included in any released version of Python >>> and so is not a backwards-compatibility concern). >>> >>> >>> C API >>> ''''' >>> >>> The C API will gain an equivalent function to ``os.fspath()``:: >>> >>> /* >>> Return the file system path of the object. >>> >>> If the object is str or bytes, then allow it to pass through with >>> an incremented refcount. If the object defines __fspath__(), then >>> return the result of that method. All other types raise a >>> TypeError. >>> */ >>> PyObject * >>> PyOS_FSPath(PyObject *path) >>> { >>> if (PyUnicode_Check(path) || PyBytes_Check(path)) { >>> Py_INCREF(path); >>> return path; >>> } >>> >>> if (PyObject_HasAttrString(path->ob_type, "__fspath__")) { >>> return PyObject_CallMethodObjArgs(path->ob_type, >>> "__fspath__", path, >>> NULL); >>> } >>> >>> return PyErr_Format(PyExc_TypeError, >>> "expected a str, bytes, or os.PathLike >>> object, not %S", >>> path->ob_type); >>> } >>> >>> >>> >>> Backwards compatibility >>> ======================= >>> >>> There are no explicit backwards-compatibility concerns. Unless an >>> object incidentally already defines a ``__fspath__()`` method there is >>> no reason to expect the pre-existing code to break or expect to have >>> its semantics implicitly changed. >>> >>> Libraries wishing to support path objects and a version of Python >>> prior to Python 3.6 and the existence of ``os.fspath()`` can use the >>> idiom of >>> ``path.__fspath__() if hasattr(path, "__fspath__") else path``. >>> >>> >>> Implementation >>> ============== >>> >>> This is the task list for what this PEP proposes: >>> >>> #. Remove the ``path`` attribute from pathlib >>> #. Remove the provisional status of pathlib >>> #. Add ``os.PathLike`` >>> #. Add ``os.fspath()`` >>> #. Add ``PyOS_FSPath()`` >>> #. Update ``os.fsencode()`` >>> #. Update ``os.fsdecode()`` >>> #. Update ``pathlib.PurePath`` and ``pathlib.Path`` >>> #. Update ``builtins.open()`` >>> #. Update ``os.DirEntry`` >>> #. Update ``os.path`` >>> #. Add a glossary entry for "path-like" >>> >>> >>> Rejected Ideas >>> ============== >>> >>> Other names for the protocol's method >>> ------------------------------------- >>> >>> Various names were proposed during discussions leading to this PEP, >>> including ``__path__``, ``__pathname__``, and ``__fspathname__``. In >>> the end people seemed to gravitate towards ``__fspath__`` for being >>> unambiguous without being unnecessarily long. >>> >>> >>> Separate str/bytes methods >>> -------------------------- >>> >>> At one point it was suggested that ``__fspath__()`` only return >>> strings and another method named ``__fspathb__()`` be introduced to >>> return bytes. The thinking is that by making ``__fspath__()`` not be >>> polymorphic it could make dealing with the potential string or bytes >>> representations easier. But the general consensus was that returning >>> bytes will more than likely be rare and that the various functions in >>> the os module are the better abstraction to promote over direct >>> calls to ``__fspath__()``. >>> >>> >>> Providing a ``path`` attribute >>> ------------------------------ >>> >>> To help deal with the issue of ``pathlib.PurePath`` not inheriting >>> from ``str``, originally it was proposed to introduce a ``path`` >>> attribute to mirror what ``os.DirEntry`` provides. In the end, >>> though, it was determined that a protocol would provide the same >>> result while not directly exposing an API that most people will never >>> need to interact with directly. >>> >>> >>> Have ``__fspath__()`` only return strings >>> ------------------------------------------ >>> >>> Much of the discussion that led to this PEP revolved around whether >>> ``__fspath__()`` should be polymorphic and return ``bytes`` as well as >>> ``str`` or only return ``str``. The general sentiment for this view >>> was that ``bytes`` are difficult to work with due to their >>> inherent lack of information about their encoding and PEP 383 makes >>> it possible to represent all file system paths using ``str`` with the >>> ``surrogateescape`` handler. Thus, it would be better to forcibly >>> promote the use of ``str`` as the low-level path representation for >>> high-level path objects. >>> >>> In the end, it was decided that using ``bytes`` to represent paths is >>> simply not going to go away and thus they should be supported to some >>> degree. The hope is that people will gravitate towards path objects >>> like pathlib and that will move people away from operating directly >>> with ``bytes``. >>> >>> >>> A generic string encoding mechanism >>> ----------------------------------- >>> >>> At one point there was a discussion of developing a generic mechanism >>> to extract a string representation of an object that had semantic >>> meaning (``__str__()`` does not necessarily return anything of >>> semantic significance beyond what may be helpful for debugging). In >>> the end, it was deemed to lack a motivating need beyond the one this >>> PEP is trying to solve in a specific fashion. >>> >>> >>> Have __fspath__ be an attribute >>> ------------------------------- >>> >>> It was briefly considered to have ``__fspath__`` be an attribute >>> instead of a method. This was rejected for two reasons. One, >>> historically protocols have been implemented as "magic methods" and >>> not "magic methods and attributes". Two, there is no guarantee that >>> the lower-level representation of a path object will be pre-computed, >>> potentially misleading users that there was no expensive computation >>> behind the scenes in case the attribute was implemented as a property. >>> >>> This also indirectly ties into the idea of introducing a ``path`` >>> attribute to accomplish the same thing. This idea has an added issue, >>> though, of accidentally having any object with a ``path`` attribute >>> meet the protocol's duck typing. Introducing a new magic method for >>> the protocol helpfully avoids any accidental opting into the protocol. >>> >>> >>> Provide specific type hinting support >>> ------------------------------------- >>> >>> There was some consideration to provdinga generic ``typing.PathLike`` >>> class which would allow for e.g. ``typing.PathLike[str]`` to specify >>> a type hint for a path object which returned a string representation. >>> While potentially beneficial, the usefulness was deemed too small to >>> bother adding the type hint class. >>> >>> This also removed any desire to have a class in the ``typing`` module >>> which represented the union of all acceptable path-representing types >>> as that can be represented with >>> ``typing.Union[str, bytes, os.PathLike]`` easily enough and the hope >>> is users will slowly gravitate to path objects only. >>> >>> >>> Provide ``os.fspathb()`` >>> ------------------------ >>> >>> It was suggested that to mirror the structure of e.g. >>> ``os.getcwd()``/``os.getcwdb()``, that ``os.fspath()`` only return >>> ``str`` and that another function named ``os.fspathb()`` be >>> introduced that only returned ``bytes``. This was rejected as the >>> purposes of the ``*b()`` functions are tied to querying the file >>> system where there is a need to get the raw bytes back. As this PEP >>> does not work directly with data on a file system (but which *may* >>> be), the view was taken this distinction is unnecessary. It's also >>> believed that the need for only bytes will not be common enough to >>> need to support in such a specific manner as ``os.fsencode()`` will >>> provide similar functionality. >>> >>> >>> Call ``__fspath__()`` off of the instance >>> ----------------------------------------- >>> >>> An earlier draft of this PEP had ``os.fspath()`` calling >>> ``path.__fspath__()`` instead of ``type(path).__fspath__(path)``. The >>> changed to be consistent with how other magic methods in Python are >>> resolved. >>> >>> >>> Acknowledgements >>> ================ >>> >>> Thanks to everyone who participated in the various discussions related >>> to this PEP that spanned both python-ideas and python-dev. Special >>> thanks to Stephen Turnbull for direct feedback on early drafts of this >>> PEP. More special thanks to Koos Zevenhoven and Ethan Furman for not >>> only feedback on early drafts of this PEP but also helping to drive >>> the overall discussion on this topic across the two mailing lists. >>> >>> >>> References >>> ========== >>> >>> .. [#python-ideas-archive] The python-ideas mailing list archive >>> (https://mail.python.org/pipermail/python-ideas/) >>> >>> .. [#python-dev-archive] The python-dev mailing list archive >>> (https://mail.python.org/pipermail/python-dev/) >>> >>> .. [#libc-open] ``open()`` documention for the C standard library >>> ( >>> http://www.gnu.org/software/libc/manual/html_node/Opening-and-Closing-Files.html >>> ) >>> >>> .. [#pathlib] The ``pathlib`` module >>> (https://docs.python.org/3/library/pathlib.html#module-pathlib) >>> >>> .. [#builtins-open] The ``builtins.open()`` function >>> (https://docs.python.org/3/library/functions.html#open) >>> >>> .. [#os-fsencode] The ``os.fsencode()`` function >>> (https://docs.python.org/3/library/os.html#os.fsencode) >>> >>> .. [#os-fsdecode] The ``os.fsdecode()`` function >>> (https://docs.python.org/3/library/os.html#os.fsdecode) >>> >>> .. [#os-direntry] The ``os.DirEntry`` class >>> (https://docs.python.org/3/library/os.html#os.DirEntry) >>> >>> .. [#os-path] The ``os.path`` module >>> (https://docs.python.org/3/library/os.path.html#module-os.path) >>> >>> >>> Copyright >>> ========= >>> >>> This document has been placed in the public domain. >>> >>> >>> .. >>> Local Variables: >>> mode: indented-text >>> indent-tabs-mode: nil >>> sentence-end-double-space: t >>> fill-column: 70 >>> coding: utf-8 >>> End: >>> >>> > > > -- > --Guido van Rossum (python.org/~guido) >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com