Once you assign yourself a PEP number I'll do one more pass and then I expect to accept it -- the draft looks good to me!
On Mon, May 16, 2016 at 1:00 PM, Brett Cannon <br...@python.org> wrote: > Recent discussions have been about type hints which are orthogonal to the > PEP, so things have seemed to have reached a steady state. > > Was there anything else that needed clarification, Guido, or are you ready > to pronounce? Or did you want to wait until the language summit? Or did you > want to assign a BDFL delegate? > > > On Fri, 13 May 2016 at 11:37 Brett Cannon <br...@python.org> wrote: > >> Biggest changes since the second draft: >> >> 1. Resolve __fspath__() from the type, not the instance (for Guido) >> 2. Updated the TypeError messages to say "os.PathLike object" instead >> of "path object" (implicitly for Steven) >> 3. TODO item to define "path-like" in the glossary (for Steven) >> 4. Various more things added to Rejected Ideas >> 5. Added Koos as a co-author (for Koos :) >> >> ---------- >> PEP: NNN >> Title: Adding a file system path protocol >> Version: $Revision$ >> Last-Modified: $Date$ >> Author: Brett Cannon <br...@python.org>, >> Koos Zevenhoven <k7ho...@gmail.com> >> Status: Draft >> Type: Standards Track >> Content-Type: text/x-rst >> Created: 11-May-2016 >> Post-History: 11-May-2016, >> 12-May-2016, >> 13-May-2016 >> >> >> Abstract >> ======== >> >> This PEP proposes a protocol for classes which represent a file system >> path to be able to provide a ``str`` or ``bytes`` representation. >> Changes to Python's standard library are also proposed to utilize this >> protocol where appropriate to facilitate the use of path objects where >> historically only ``str`` and/or ``bytes`` file system paths are >> accepted. The goal is to facilitate the migration of users towards >> rich path objects while providing an easy way to work with code >> expecting ``str`` or ``bytes``. >> >> >> Rationale >> ========= >> >> Historically in Python, file system paths have been represented as >> strings or bytes. This choice of representation has stemmed from C's >> own decision to represent file system paths as >> ``const char *`` [#libc-open]_. While that is a totally serviceable >> format to use for file system paths, it's not necessarily optimal. At >> issue is the fact that while all file system paths can be represented >> as strings or bytes, not all strings or bytes represent a file system >> path. This can lead to issues where any e.g. string duck-types to a >> file system path whether it actually represents a path or not. >> >> To help elevate the representation of file system paths from their >> representation as strings and bytes to a richer object representation, >> the pathlib module [#pathlib]_ was provisionally introduced in >> Python 3.4 through PEP 428. While considered by some as an improvement >> over strings and bytes for file system paths, it has suffered from a >> lack of adoption. Typically the key issue listed for the low adoption >> rate has been the lack of support in the standard library. This lack >> of support required users of pathlib to manually convert path objects >> to strings by calling ``str(path)`` which many found error-prone. >> >> One issue in converting path objects to strings comes from >> the fact that the only generic way to get a string representation of >> the path was to pass the object to ``str()``. This can pose a >> problem when done blindly as nearly all Python objects have some >> string representation whether they are a path or not, e.g. >> ``str(None)`` will give a result that >> ``builtins.open()`` [#builtins-open]_ will happily use to create a new >> file. >> >> Exacerbating this whole situation is the >> ``DirEntry`` object [#os-direntry]_. While path objects have a >> representation that can be extracted using ``str()``, ``DirEntry`` >> objects expose a ``path`` attribute instead. Having no common >> interface between path objects, ``DirEntry``, and any other >> third-party path library has become an issue. A solution that allows >> any path-representing object to declare that it is a path and a way >> to extract a low-level representation that all path objects could >> support is desired. >> >> This PEP then proposes to introduce a new protocol to be followed by >> objects which represent file system paths. Providing a protocol allows >> for explicit signaling of what objects represent file system paths as >> well as a way to extract a lower-level representation that can be used >> with older APIs which only support strings or bytes. >> >> Discussions regarding path objects that led to this PEP can be found >> in multiple threads on the python-ideas mailing list archive >> [#python-ideas-archive]_ for the months of March and April 2016 and on >> the python-dev mailing list archives [#python-dev-archive]_ during >> April 2016. >> >> >> Proposal >> ======== >> >> This proposal is split into two parts. One part is the proposal of a >> protocol for objects to declare and provide support for exposing a >> file system path representation. The other part deals with changes to >> Python's standard library to support the new protocol. These changes >> will also lead to the pathlib module dropping its provisional status. >> >> Protocol >> -------- >> >> The following abstract base class defines the protocol for an object >> to be considered a path object:: >> >> import abc >> import typing as t >> >> >> class PathLike(abc.ABC): >> >> """Abstract base class for implementing the file system path >> protocol.""" >> >> @abc.abstractmethod >> def __fspath__(self) -> t.Union[str, bytes]: >> """Return the file system path representation of the >> object.""" >> raise NotImplementedError >> >> >> Objects representing file system paths will implement the >> ``__fspath__()`` method which will return the ``str`` or ``bytes`` >> representation of the path. The ``str`` representation is the >> preferred low-level path representation as it is human-readable and >> what people historically represent paths as. >> >> >> Standard library changes >> ------------------------ >> >> It is expected that most APIs in Python's standard library that >> currently accept a file system path will be updated appropriately to >> accept path objects (whether that requires code or simply an update >> to documentation will vary). The modules mentioned below, though, >> deserve specific details as they have either fundamental changes that >> empower the ability to use path objects, or entail additions/removal >> of APIs. >> >> >> builtins >> '''''''' >> >> ``open()`` [#builtins-open]_ will be updated to accept path objects as >> well as continue to accept ``str`` and ``bytes``. >> >> >> os >> ''' >> >> The ``fspath()`` function will be added with the following semantics:: >> >> import typing as t >> >> >> def fspath(path: t.Union[PathLike, str, bytes]) -> t.Union[str, >> bytes]: >> """Return the string representation of the path. >> >> If str or bytes is passed in, it is returned unchanged. >> """ >> if isinstance(path, (str, bytes)): >> return path >> >> # Work from the object's type to match method resolution of other >> magic >> # methods. >> path_type = type(path) >> try: >> return path_type.__fspath__(path) >> except AttributeError: >> if hasattr(path_type, '__fspath__'): >> raise >> >> raise TypeError("expected str, bytes or os.PathLike object, >> not " >> + path_type.__name__) >> >> The ``os.fsencode()`` [#os-fsencode]_ and >> ``os.fsdecode()`` [#os-fsdecode]_ functions will be updated to accept >> path objects. As both functions coerce their arguments to >> ``bytes`` and ``str``, respectively, they will be updated to call >> ``__fspath__()`` if present to convert the path object to a ``str`` or >> ``bytes`` representation, and then perform their appropriate >> coercion operations as if the return value from ``__fspath__()`` had >> been the original argument to the coercion function in question. >> >> The addition of ``os.fspath()``, the updates to >> ``os.fsencode()``/``os.fsdecode()``, and the current semantics of >> ``pathlib.PurePath`` provide the semantics necessary to >> get the path representation one prefers. For a path object, >> ``pathlib.PurePath``/``Path`` can be used. To obtain the ``str`` or >> ``bytes`` representation without any coersion, then ``os.fspath()`` >> can be used. If a ``str`` is desired and the encoding of ``bytes`` >> should be assumed to be the default file system encoding, then >> ``os.fsdecode()`` should be used. If a ``bytes`` representation is >> desired and any strings should be encoded using the default file >> system encoding, then ``os.fsencode()`` is used. This PEP recommends >> using path objects when possible and falling back to string paths as >> necessary and using ``bytes`` as a last resort. >> >> Another way to view this is as a hierarchy of file system path >> representations (highest- to lowest-level): path → str → bytes. The >> functions and classes under discussion can all accept objects on the >> same level of the hierarchy, but they vary in whether they promote or >> demote objects to another level. The ``pathlib.PurePath`` class can >> promote a ``str`` to a path object. The ``os.fspath()`` function can >> demote a path object to a ``str`` or ``bytes`` instance, depending >> on what ``__fspath__()`` returns. >> The ``os.fsdecode()`` function will demote a path object to >> a string or promote a ``bytes`` object to a ``str``. The >> ``os.fsencode()`` function will demote a path or string object to >> ``bytes``. There is no function that provides a way to demote a path >> object directly to ``bytes`` while bypassing string demotion. >> >> The ``DirEntry`` object [#os-direntry]_ will gain an ``__fspath__()`` >> method. It will return the same value as currently found on the >> ``path`` attribute of ``DirEntry`` instances. >> >> The Protocol_ ABC will be added to the ``os`` module under the name >> ``os.PathLike``. >> >> >> os.path >> ''''''' >> >> The various path-manipulation functions of ``os.path`` [#os-path]_ >> will be updated to accept path objects. For polymorphic functions that >> accept both bytes and strings, they will be updated to simply use >> ``os.fspath()``. >> >> During the discussions leading up to this PEP it was suggested that >> ``os.path`` not be updated using an "explicit is better than implicit" >> argument. The thinking was that since ``__fspath__()`` is polymorphic >> itself it may be better to have code working with ``os.path`` extract >> the path representation from path objects explicitly. There is also >> the consideration that adding support this deep into the low-level OS >> APIs will lead to code magically supporting path objects without >> requiring any documentation updated, leading to potential complaints >> when it doesn't work, unbeknownst to the project author. >> >> But it is the view of this PEP that "practicality beats purity" in >> this instance. To help facilitate the transition to supporting path >> objects, it is better to make the transition as easy as possible than >> to worry about unexpected/undocumented duck typing support for >> path objects by projects. >> >> There has also been the suggestion that ``os.path`` functions could be >> used in a tight loop and the overhead of checking or calling >> ``__fspath__()`` would be too costly. In this scenario only >> path-consuming APIs would be directly updated and path-manipulating >> APIs like the ones in ``os.path`` would go unmodified. This would >> require library authors to update their code to support path objects >> if they performed any path manipulations, but if the library code >> passed the path straight through then the library wouldn't need to be >> updated. It is the view of this PEP and Guido, though, that this is an >> unnecessary worry and that performance will still be acceptable. >> >> >> pathlib >> ''''''' >> >> The constructor for ``pathlib.PurePath`` and ``pathlib.Path`` will be >> updated to accept ``PathLike`` objects. Both ``PurePath`` and ``Path`` >> will continue to not accept ``bytes`` path representations, and so if >> ``__fspath__()`` returns ``bytes`` it will raise an exception. >> >> The ``path`` attribute will be removed as this PEP makes it >> redundant (it has not been included in any released version of Python >> and so is not a backwards-compatibility concern). >> >> >> C API >> ''''' >> >> The C API will gain an equivalent function to ``os.fspath()``:: >> >> /* >> Return the file system path of the object. >> >> If the object is str or bytes, then allow it to pass through with >> an incremented refcount. If the object defines __fspath__(), then >> return the result of that method. All other types raise a >> TypeError. >> */ >> PyObject * >> PyOS_FSPath(PyObject *path) >> { >> if (PyUnicode_Check(path) || PyBytes_Check(path)) { >> Py_INCREF(path); >> return path; >> } >> >> if (PyObject_HasAttrString(path->ob_type, "__fspath__")) { >> return PyObject_CallMethodObjArgs(path->ob_type, >> "__fspath__", path, >> NULL); >> } >> >> return PyErr_Format(PyExc_TypeError, >> "expected a str, bytes, or os.PathLike >> object, not %S", >> path->ob_type); >> } >> >> >> >> Backwards compatibility >> ======================= >> >> There are no explicit backwards-compatibility concerns. Unless an >> object incidentally already defines a ``__fspath__()`` method there is >> no reason to expect the pre-existing code to break or expect to have >> its semantics implicitly changed. >> >> Libraries wishing to support path objects and a version of Python >> prior to Python 3.6 and the existence of ``os.fspath()`` can use the >> idiom of >> ``path.__fspath__() if hasattr(path, "__fspath__") else path``. >> >> >> Implementation >> ============== >> >> This is the task list for what this PEP proposes: >> >> #. Remove the ``path`` attribute from pathlib >> #. Remove the provisional status of pathlib >> #. Add ``os.PathLike`` >> #. Add ``os.fspath()`` >> #. Add ``PyOS_FSPath()`` >> #. Update ``os.fsencode()`` >> #. Update ``os.fsdecode()`` >> #. Update ``pathlib.PurePath`` and ``pathlib.Path`` >> #. Update ``builtins.open()`` >> #. Update ``os.DirEntry`` >> #. Update ``os.path`` >> #. Add a glossary entry for "path-like" >> >> >> Rejected Ideas >> ============== >> >> Other names for the protocol's method >> ------------------------------------- >> >> Various names were proposed during discussions leading to this PEP, >> including ``__path__``, ``__pathname__``, and ``__fspathname__``. In >> the end people seemed to gravitate towards ``__fspath__`` for being >> unambiguous without being unnecessarily long. >> >> >> Separate str/bytes methods >> -------------------------- >> >> At one point it was suggested that ``__fspath__()`` only return >> strings and another method named ``__fspathb__()`` be introduced to >> return bytes. The thinking is that by making ``__fspath__()`` not be >> polymorphic it could make dealing with the potential string or bytes >> representations easier. But the general consensus was that returning >> bytes will more than likely be rare and that the various functions in >> the os module are the better abstraction to promote over direct >> calls to ``__fspath__()``. >> >> >> Providing a ``path`` attribute >> ------------------------------ >> >> To help deal with the issue of ``pathlib.PurePath`` not inheriting >> from ``str``, originally it was proposed to introduce a ``path`` >> attribute to mirror what ``os.DirEntry`` provides. In the end, >> though, it was determined that a protocol would provide the same >> result while not directly exposing an API that most people will never >> need to interact with directly. >> >> >> Have ``__fspath__()`` only return strings >> ------------------------------------------ >> >> Much of the discussion that led to this PEP revolved around whether >> ``__fspath__()`` should be polymorphic and return ``bytes`` as well as >> ``str`` or only return ``str``. The general sentiment for this view >> was that ``bytes`` are difficult to work with due to their >> inherent lack of information about their encoding and PEP 383 makes >> it possible to represent all file system paths using ``str`` with the >> ``surrogateescape`` handler. Thus, it would be better to forcibly >> promote the use of ``str`` as the low-level path representation for >> high-level path objects. >> >> In the end, it was decided that using ``bytes`` to represent paths is >> simply not going to go away and thus they should be supported to some >> degree. The hope is that people will gravitate towards path objects >> like pathlib and that will move people away from operating directly >> with ``bytes``. >> >> >> A generic string encoding mechanism >> ----------------------------------- >> >> At one point there was a discussion of developing a generic mechanism >> to extract a string representation of an object that had semantic >> meaning (``__str__()`` does not necessarily return anything of >> semantic significance beyond what may be helpful for debugging). In >> the end, it was deemed to lack a motivating need beyond the one this >> PEP is trying to solve in a specific fashion. >> >> >> Have __fspath__ be an attribute >> ------------------------------- >> >> It was briefly considered to have ``__fspath__`` be an attribute >> instead of a method. This was rejected for two reasons. One, >> historically protocols have been implemented as "magic methods" and >> not "magic methods and attributes". Two, there is no guarantee that >> the lower-level representation of a path object will be pre-computed, >> potentially misleading users that there was no expensive computation >> behind the scenes in case the attribute was implemented as a property. >> >> This also indirectly ties into the idea of introducing a ``path`` >> attribute to accomplish the same thing. This idea has an added issue, >> though, of accidentally having any object with a ``path`` attribute >> meet the protocol's duck typing. Introducing a new magic method for >> the protocol helpfully avoids any accidental opting into the protocol. >> >> >> Provide specific type hinting support >> ------------------------------------- >> >> There was some consideration to provdinga generic ``typing.PathLike`` >> class which would allow for e.g. ``typing.PathLike[str]`` to specify >> a type hint for a path object which returned a string representation. >> While potentially beneficial, the usefulness was deemed too small to >> bother adding the type hint class. >> >> This also removed any desire to have a class in the ``typing`` module >> which represented the union of all acceptable path-representing types >> as that can be represented with >> ``typing.Union[str, bytes, os.PathLike]`` easily enough and the hope >> is users will slowly gravitate to path objects only. >> >> >> Provide ``os.fspathb()`` >> ------------------------ >> >> It was suggested that to mirror the structure of e.g. >> ``os.getcwd()``/``os.getcwdb()``, that ``os.fspath()`` only return >> ``str`` and that another function named ``os.fspathb()`` be >> introduced that only returned ``bytes``. This was rejected as the >> purposes of the ``*b()`` functions are tied to querying the file >> system where there is a need to get the raw bytes back. As this PEP >> does not work directly with data on a file system (but which *may* >> be), the view was taken this distinction is unnecessary. It's also >> believed that the need for only bytes will not be common enough to >> need to support in such a specific manner as ``os.fsencode()`` will >> provide similar functionality. >> >> >> Call ``__fspath__()`` off of the instance >> ----------------------------------------- >> >> An earlier draft of this PEP had ``os.fspath()`` calling >> ``path.__fspath__()`` instead of ``type(path).__fspath__(path)``. The >> changed to be consistent with how other magic methods in Python are >> resolved. >> >> >> Acknowledgements >> ================ >> >> Thanks to everyone who participated in the various discussions related >> to this PEP that spanned both python-ideas and python-dev. Special >> thanks to Stephen Turnbull for direct feedback on early drafts of this >> PEP. More special thanks to Koos Zevenhoven and Ethan Furman for not >> only feedback on early drafts of this PEP but also helping to drive >> the overall discussion on this topic across the two mailing lists. >> >> >> References >> ========== >> >> .. [#python-ideas-archive] The python-ideas mailing list archive >> (https://mail.python.org/pipermail/python-ideas/) >> >> .. [#python-dev-archive] The python-dev mailing list archive >> (https://mail.python.org/pipermail/python-dev/) >> >> .. [#libc-open] ``open()`` documention for the C standard library >> ( >> http://www.gnu.org/software/libc/manual/html_node/Opening-and-Closing-Files.html >> ) >> >> .. [#pathlib] The ``pathlib`` module >> (https://docs.python.org/3/library/pathlib.html#module-pathlib) >> >> .. [#builtins-open] The ``builtins.open()`` function >> (https://docs.python.org/3/library/functions.html#open) >> >> .. [#os-fsencode] The ``os.fsencode()`` function >> (https://docs.python.org/3/library/os.html#os.fsencode) >> >> .. [#os-fsdecode] The ``os.fsdecode()`` function >> (https://docs.python.org/3/library/os.html#os.fsdecode) >> >> .. [#os-direntry] The ``os.DirEntry`` class >> (https://docs.python.org/3/library/os.html#os.DirEntry) >> >> .. [#os-path] The ``os.path`` module >> (https://docs.python.org/3/library/os.path.html#module-os.path) >> >> >> Copyright >> ========= >> >> This document has been placed in the public domain. >> >> >> .. >> Local Variables: >> mode: indented-text >> indent-tabs-mode: nil >> sentence-end-double-space: t >> fill-column: 70 >> coding: utf-8 >> End: >> >> -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com