On 12.05.16 01:13, Brett Cannon wrote:
On Wed, 11 May 2016 at 13:45 Serhiy Storchaka <storch...@gmail.com <mailto:storch...@gmail.com>> wrote: On 11.05.16 19:43, Brett Cannon wrote: > os.path > ''''''' > > The various path-manipulation functions of ``os.path`` [#os-path]_ > will be updated to accept path objects. For polymorphic functions that > accept both bytes and strings, they will be updated to simply use > code very much similar to > ``path.__fspath__() if hasattr(path, '__fspath__') else path``. This > will allow for their pre-existing type-checking code to continue to > function. I afraid that this will hit a performance. Some os.path functions are used in tight loops, they are hard optimized, and adding support of path protocol can have visible negative effect. As others have asked, what specific examples do you have that os.path is used in a tight loop w/o any I/O that would overwhelm the performance?
Most examples does some I/O (like os.lstat()): posixpath.realpath(), os.walk(), glob.glob(). But for example os.walk() was significantly boosted with using os.scandir(), it would be sad to make it slower again. os.path is used in number of files, sometimes in loops, sometimes indirectly. It is hard to find all examples.
Such functions as glob.glob() calls split() and join() for every component, but they also use string or bytes operations with paths. So they need to convert argument to str or bytes before start iteration, and always call os.path functions only with str or bytes. Additional conversion in every os.path function is redundant. I suppose most other high-level functions that manipulates paths in a loop also should convert arguments once at the start and don't need the support of path protocol in os.path functions.
I see this whole discussion breaking down into a few groups which changes what gets done upfront and what might be done farther down the line: 1. Maximum acceptance: do whatever we can to make all representation of paths just work, which means making all places working with a path in the stdlib accept path objects, str, and bytes. 2. Safely use path objects: __fspath__() is there to signal an object is a file system path and to get back a lower-level representation so people stop calling str() on everything, providing some interface signaling that someone doesn't misuse an object as a path and only changing path consumptions APIs -- e.g. open() -- and not path manipulation APIs -- e.g. os.path -- in the stdlib. 3. It ain't worth it: those that would rather just skip all of this and drop pathlib from the stdlib. Ethan and Koos are in group #1 and I'm personally in group #2 but I tried to compromise somewhat and find a middle ground in the PEP with the level of changes in the stdlib but being more restrictive with os.fspath(). If I were doing a pure group #2 PEP I would drop os.path changes and make os.fspath() do what Ethan and Koos have suggested and simply pass through without checks whatever path.__fspath__() returned if the argument wasn't str or bytes.
I'm for adding conversions in C implemented path consuming APIs and may be in high-level path manipulation functions like os.walk(), but left low-level API of os.path, fnmatch and glob unchanged.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com