On 12.05.16 01:13, Brett Cannon wrote:


On Wed, 11 May 2016 at 13:45 Serhiy Storchaka <storch...@gmail.com
<mailto:storch...@gmail.com>> wrote:

    On 11.05.16 19:43, Brett Cannon wrote:
     > os.path
     > '''''''
     >
     > The various path-manipulation functions of ``os.path`` [#os-path]_
     > will be updated to accept path objects. For polymorphic functions
    that
     > accept both bytes and strings, they will be updated to simply use
     > code very much similar to
     > ``path.__fspath__() if  hasattr(path, '__fspath__') else path``. This
     > will allow for their pre-existing type-checking code to continue to
     > function.

    I afraid that this will hit a performance. Some os.path functions are
    used in tight loops, they are hard optimized, and adding support of path
    protocol can have visible negative effect.


As others have asked, what specific examples do you have that os.path is
used in a tight loop w/o any I/O that would overwhelm the performance?

Most examples does some I/O (like os.lstat()): posixpath.realpath(), os.walk(), glob.glob(). But for example os.walk() was significantly boosted with using os.scandir(), it would be sad to make it slower again. os.path is used in number of files, sometimes in loops, sometimes indirectly. It is hard to find all examples.

Such functions as glob.glob() calls split() and join() for every component, but they also use string or bytes operations with paths. So they need to convert argument to str or bytes before start iteration, and always call os.path functions only with str or bytes. Additional conversion in every os.path function is redundant. I suppose most other high-level functions that manipulates paths in a loop also should convert arguments once at the start and don't need the support of path protocol in os.path functions.

I see this whole discussion breaking down into a few groups which
changes what gets done upfront and what might be done farther down the line:

 1. Maximum acceptance: do whatever we can to make all representation of
    paths just work, which means making all places working with a path
    in the stdlib accept path objects, str, and bytes.
 2. Safely use path objects: __fspath__() is there to signal an object
    is a file system path and to get back a lower-level representation
    so people stop calling str() on everything, providing some interface
    signaling that someone doesn't misuse an object as a path and only
    changing path consumptions APIs -- e.g. open() -- and not path
    manipulation APIs -- e.g. os.path -- in the stdlib.
 3. It ain't worth it: those that would rather just skip all of this and
    drop pathlib from the stdlib.

Ethan and Koos are in group #1 and I'm personally in group #2 but I
tried to compromise somewhat and find a middle ground in the PEP with
the level of changes in the stdlib but being more restrictive with
os.fspath(). If I were doing a pure group #2 PEP I would drop os.path
changes and make os.fspath() do what Ethan and Koos have suggested and
simply pass through without checks whatever path.__fspath__() returned
if the argument wasn't str or bytes.

I'm for adding conversions in C implemented path consuming APIs and may be in high-level path manipulation functions like os.walk(), but left low-level API of os.path, fnmatch and glob unchanged.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to