On 05/24/2017 02:41 AM, Steven D'Aprano wrote:
On Wed, May 24, 2017 at 12:18:16AM +0300, Serhiy Storchaka wrote:

It seems to me that the purpose of this proposition is not performance,
but the possibility to use __fspath__ in str or bytes subclasses.
Currently defining __fspath__ in str or bytes subclasses doesn't have
any effect.

That's how I interpreted the proposal, with any performance issue being
secondary. (I don't expect that converting path-like objects to strings
would be the bottleneck in any application doing actual disk IO.)

I don't know a reasonable use case for this feature. The __fspath__
method of str or bytes subclasses returning something not equivalent to
self looks confusing to me.

I can imagine at least two:

- emulating something like DOS 8.3 versus long file names;
- case normalisation

but what would make this really useful is for debugging. For instance, I
have used something like this to debug problems with int() being called
wrongly:

py> class MyInt(int):
...     def __int__(self):
...             print("__int__ called")
...             return super().__int__()
...
py> x = MyInt(23)
py> int(x)
__int__ called
23

It would be annoying and inconsistent if int(x) avoided calling __int__
on int subclasses. But that's exactly what happens with fspath and str.
I see that as a bug, not a feature: I find it hard to believe that we
would design an interface for string-like objects (paths) and then
intentionally prohibit it from applying to strings.

And if we did, surely its a misfeature. Why *shouldn't* subclasses of
str get the same opportunity to customize the result of __fspath__ as
they get to customize their __repr__ and __str__?

py> class MyStr(str):
...     def __repr__(self):
...             return 'repr'
...     def __str__(self):
...             return 'str'
...
py> s = MyStr('abcdef')
py> repr(s)
'repr'
py> str(s)
'str'


This is almost exactly what I have been thinking (just that I couldn't have presented it so clearly)!

Lets look at a potential usecase for this. Assume that in a package you want to handle several paths to different files and directories that are all located in a common package-specific parent directory. Then using the path protocol you could write this:

class PackageBase (object):
    basepath = '/home/.package'

class PackagePath (str, PackageBase):
    def __fspath__ ():
        return os.path.join(self.basepath, str(self))

config_file = PackagePath('.config')
log_file = PackagePath('events.log')
data_dir = PackagePath('data')

with open(log_file) as log:
    log.write('package paths initialized.\n')


Just that this wouldn't currently work because PackagePath inherits from str. Of course, there are other ways to achieve the above, but when you think about designing a Path-like object class str is just a pretty attractive base class to start from.

Now lets look at compatibility of a class like PackagePath under this proposal:

- if client code uses e.g. str(config_file) and proceeds to treat the resulting object as a path unexpected things will happen and, yes, that's bad. However, this is no different from any other Path-like object for which __str__ and __fspath__ don't define the same return value.

- if client code uses the PEP-recommended backwards-compatible way of dealing with paths,

path.__fspath__() if hasattr(path, "__fspath__") else path

things will just work. Interstingly, this would *currently* produce an unexpected result namely that it would execute the__fspath__ method of the str-subclass

- if client code uses instances of PackagePath as paths directly then in Python3.6 and below that would lead to unintended outcome, while in Python3.7 things would work. This is *really* bad.

But what it means is that, under the proposal, using a str or bytes subclass with an __fspath__ method defined makes your code backwards-incompatible and the solution would be not to use such a class if you want to be backwards-compatible (and that should get documented somewhere). This restriction, of course, limits the usefulness of the proposal in the near future, but that disadvantage will vanish over time. In 5 years, not supporting Python3.6 anymore maybe won't be a big deal anymore (for comparison, Python3.2 was released 6 years ago and since last years pip is no longer supporting it). As Steven pointed out the proposal is *very* unlikely to break existing code.

So to summarize, the proposal

- avoids an up-front isinstance check in the protocol and thereby speeds up the processing of exact strings and bytes and of anything that follows the path protocol.*

- slows down the processing of instances of regular str and bytes subclasses*

- makes the "path.__fspath__() if hasattr(path, "__fspath__") else path" idiom consistent for subclasses of str and bytes that define __fspath__

- opens up the opportunity to write str/bytes subclasses that represent a path other than just their self in the future**

Still sounds like a net win to me, but lets see what I forgot ...

* yes, speed is typically not your primary concern when it comes to IO; what's often neglected though is that not all path operations have to trigger actual IO (things in os.path for example don't typically perform IO)

** somebody on the list (I guess it was Koos?) mentioned that such classes would only make sense if Python ever disallowed the use of str/bytes as paths, but I don't think that is a prerequisite here.

Wolfgang

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to