Re: [Python-Dev] pathlib - current status of discussions
On 17 April 2016 at 04:47, Chris Barker - NOAA Federalwrote: >> On Apr 13, 2016, at 8:31 PM, Nick Coghlan wrote: >> class Special(bytes): def __fspath__(self): return 'str-val' obj = Special('bytes-val', 'utf8') path_obj = fspath(obj, allow_bytes=True) With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'. >> >> In this kind of case, inheritance tends to trump protocol. > > Sure, but... > >> example, int subclasses can't override operator.index: > ... >> The reasons for that behaviour are more pragmatic than philosophical: >> builtins and their subclasses are extensively special-cased for speed >> reasons, > > OK, but in this case, purity can beat practicality. If the author > writes an __fspath__ method, presumably it's because it should be > used. > > And I can certainly imagine one might want to store a path > representation as bytes, but NOT want the raw bytes passed off to file > handling libs. > > (of course you could use composition rather than subclassing if you had to) Exactly - inheritance is a really strong relationship that directly affects the in-memory layout of instances (at least in CPython), and also the kinds of assumption other code will make about that type (for example, subclasses are special cased to allow them to override the behaviour of numeric binary operators when they appear as the right operand with an instance of the parent type as the left operand, while with unrelated types, the left operand always gets the first chance to handle the operation). When folks don't want to trigger those "this is an " behaviours, the appropriate design pattern is composition, not inheritance (and many of the ABCs were introduced to make it easier to implement particular interfaces without inheriting from the corresponding builtin types). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
> On Apr 13, 2016, at 8:31 PM, Nick Coghlanwrote: > >>> class Special(bytes): >>> def __fspath__(self): >>> return 'str-val' >>> obj = Special('bytes-val', 'utf8') >>> path_obj = fspath(obj, allow_bytes=True) >>> >>> With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'. > > In this kind of case, inheritance tends to trump protocol. Sure, but... > example, int subclasses can't override operator.index: ... > The reasons for that behaviour are more pragmatic than philosophical: > builtins and their subclasses are extensively special-cased for speed > reasons, OK, but in this case, purity can beat practicality. If the author writes an __fspath__ method, presumably it's because it should be used. And I can certainly imagine one might want to store a path representation as bytes, but NOT want the raw bytes passed off to file handling libs. (of course you could use composition rather than subclassing if you had to) -CHB ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 15 April 2016 at 00:01, Random832wrote: > On Thu, Apr 14, 2016, at 09:50, Chris Angelico wrote: >> Adding integers and floats is considered "safe" because most people's >> use of floats completely compasses their use of ints. (You'll get >> OverflowError if it can't be represented.) But float and Decimal are >> considered "unsafe": >> >> >>> 1.5 + decimal.Decimal("1.5") >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: unsupported operand type(s) for +: 'float' and >> 'decimal.Decimal' >> >> This is more what's happening here. Floats and Decimals can represent >> similar sorts of things, but with enough incompatibilities that you >> can't simply merge them. > > And what such incompatibilities exist between bytes and str for the > purpose of representing file paths? At the end of the day, there's > exactly one answer to "what file on disk this represents (or would > represent if it existed)". Bytes paths on WIndows are encoded as mbcs for use with the ASCII-only Windows APIs, and hence don't support the full range of characters that str does. The colloquial shorthand for that is "bytes paths don't work properly on Windows" (the more strictly accurate description is "bytes paths only work correctly on Windows if every code point in the path can be encoded using the 'mbcs' codec"). Even on *nix, os.fsencode may fail outright if the system is configured to use a non-universal encoding, while os.fsdecode may pollute the resulting string with surrogate escaped characters. Regardless of platform, if somebody hands you *mixed* bytes and str data, the appropriate default reaction is to complain about it rather than assume they meant one or the other. That complaint may take one of two forms: - for a high level, platform independent API, bytes should just be rejected outright - for a low level API with input type dependent behaviour, the input should be rejected as ambiguous - the API doesn't know whether the str behaviour or the bytes behaviour is the intended one pathlib falls into the first category - it just rejects bytes as input os.path.join falls into the second category - all str is fine, and all bytes is fine, but mixing them fails However, once somebody reaches for the coercion APIs (fsdecode and fsencode), they're now *explicitly* telling the interpreter what they want, since there's no ambiguity about the possible return types from those functions. In relation to Victor's comment about this being complex code to show to a novice: os.path.join(*map(os.fsdecode, ("str", b"bytes"))) I agree, but also think that's a good reason for people to switch to teaching novices pathlib rather than os.path, and letting them discover the underlying libraries as required by the code and examples they encounter. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Thu, Apr 14, 2016 at 9:35 PM, Random832wrote: > On Thu, Apr 14, 2016, at 13:56, Koos Zevenhoven wrote: >> (1) Code that has access to pathname/filename data and has some level >> of control over what data type comes in. This code may for instance >> choose to deal with either bytes or str >> >> (2) Code that takes the path or file name that it happens to get and >> does something with it. This type of code can be divided into >> subgroups as follows: >> >> (2a) Code that accepts only one type of paths (e.g. str, bytes or >> pathlib) and fails if it gets something else. > > Ideally, these should go away. > I don't think so. (1) might even be the most common type of all code. This is code that gets a path from user input, from a config file, from a database etc. and then does things with it, typically including passing it to type (2) code and potentially getting a path back from there too. >> (2b) Code that wants to support different types of paths such as >> str, bytes or pathlib objects. This includes os.path.*, os.scandir, >> and various other standard library code. Presumably there is also >> third-party code that does the same. These functions may want to >> preserve the str-ness or bytes-ness of the paths in case they return >> paths, as the stdlib now does. But new code may even want to return >> pathlib objects when they get such objects as inputs. > > Hold on. None of the discussion I've seen has included any way to > specify how to construct a new object representing a different path > other than the ones passed in. Surely you're not suggesting type(a)(b). > That's right. This protocol is not solving the issue of returning 'rich' path objects. It's solving the issue of passing those objects to lower-level functions or to interact with other 'rich' path types. What I meant by this is that there may be code that *does* want to do type(a)(b), which is out of our control. Maybe I should not have mentioned that. > Also, how does DirEntry fit in with any of this? > os.scandir + DirEntry are one of the many things in the stdlib that give you pathnames of the same type as those that were put in. >> This is the >> duck-typing or polymorphic code we have been talking about. Code of >> this type (2b) may want to avoid implicit conversions because it makes >> the life of code of the other types more difficult. > > As long as the type it returns is still a path/bytes/str (and therefore > can be accepted when the caller passes it somewhere else) what's the > problem? No, because not all paths are passed to the function that does the implicit conversion, and then when for instance os.path.joining two paths of a differenty type, it raises an error. In other words: Most non-library code (even library code?) deals with one specific type and does not want implicit conversions to other types. Some code (2b) deals with several types and, at least in the stdlib, such code returns paths of the same type as they are given, which makes said "most non-library code" happy, because it does not force the programmer to think about type conversions. (Then there is also code that explicitly deals with type conversions, such as os.fsencode and os.fsdecode.) -Koos ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Thu, Apr 14, 2016, at 13:56, Koos Zevenhoven wrote: > (1) Code that has access to pathname/filename data and has some level > of control over what data type comes in. This code may for instance > choose to deal with either bytes or str > > (2) Code that takes the path or file name that it happens to get and > does something with it. This type of code can be divided into > subgroups as follows: > > (2a) Code that accepts only one type of paths (e.g. str, bytes or > pathlib) and fails if it gets something else. Ideally, these should go away. > (2b) Code that wants to support different types of paths such as > str, bytes or pathlib objects. This includes os.path.*, os.scandir, > and various other standard library code. Presumably there is also > third-party code that does the same. These functions may want to > preserve the str-ness or bytes-ness of the paths in case they return > paths, as the stdlib now does. But new code may even want to return > pathlib objects when they get such objects as inputs. Hold on. None of the discussion I've seen has included any way to specify how to construct a new object representing a different path other than the ones passed in. Surely you're not suggesting type(a)(b). Also, how does DirEntry fit in with any of this? > This is the > duck-typing or polymorphic code we have been talking about. Code of > this type (2b) may want to avoid implicit conversions because it makes > the life of code of the other types more difficult. As long as the type it returns is still a path/bytes/str (and therefore can be accepted when the caller passes it somewhere else) what's the problem? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 04/14/2016 10:22 AM, Paul Moore wrote: On 14 April 2016 at 17:46, Ethan Furman wrote: If you are not working at the bytes layer, you shouldn't be getting bytes objects because: - you specified str when asking for data from the OS, or - you transformed the incoming bytes from whatever external source to str when you received them. My experience is that (particularly with code that was originally written for Python 2) "you have control of your data" is often an illusion - bytes can appear in code from unexpected sources, and when they do I'd rather see an error if I'm using code where I expect a string. Certainly that's a bug in the code - all I'm saying is that it fail early rather than late. If we have one function that uses a flag and you leave the flag alone (it defaults to rejecting bytes) -- voila! An error is raised when bytes show up. I'd appreciate it if anyone can clarify why "gracefully extending" the protocol to include bytes support at a later date isn't practical. It's going to be a bunch of work. I don't want to do the work twice. On the other hand, if while doing the work it becomes apparent that supporting bytes and str in the protocol is either infeasible, confusing, or a plain ol' bad idea I have no problem ripping out the bytes support and going to str only. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Thu, Apr 14, 2016 at 7:46 PM, Ethan Furmanwrote: > > What many folks seem to be missing is that *you* (generic you) have control > of your data. > > If you are not working at the bytes layer, you shouldn't be getting bytes > objects because: > > - you specified str when asking for data from the OS, or > - you transformed the incoming bytes from whatever external source > to str when you received them. There is an apparent contradiction of the above with some previous posts, including your own. Let me try to fix it: Code that deals with paths can be divided in groups as follows: (1) Code that has access to pathname/filename data and has some level of control over what data type comes in. This code may for instance choose to deal with either bytes or str (2) Code that takes the path or file name that it happens to get and does something with it. This type of code can be divided into subgroups as follows: (2a) Code that accepts only one type of paths (e.g. str, bytes or pathlib) and fails if it gets something else. (2b) Code that wants to support different types of paths such as str, bytes or pathlib objects. This includes os.path.*, os.scandir, and various other standard library code. Presumably there is also third-party code that does the same. These functions may want to preserve the str-ness or bytes-ness of the paths in case they return paths, as the stdlib now does. But new code may even want to return pathlib objects when they get such objects as inputs. This is the duck-typing or polymorphic code we have been talking about. Code of this type (2b) may want to avoid implicit conversions because it makes the life of code of the other types more difficult. (feel free to fill in more categories of code) So the code of type (2b) is trying to make all categories happy by returning objects of the same type that it gets as input, while the other categories are probably in the situation where they don't necessarily need to make other categories of code happy. And the question is this: Do we need to make code using both bytes *and* scandir happy? This is largely the same question as whether we have to support bytes in addition to str in the protocol. (We may of course talk about third-party path libraries that have the same problem as scandir's DirEntry. Ethan's library is not exactly in the same category as DirEntry since its path objects *are* instances of bytes or str and therefore do not need this protocol to begin with, except perhaps for conversions from other high-level path types so that different path libraries work together nicely). -Koos ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 14 April 2016 at 17:46, Ethan Furmanwrote: > On 04/14/2016 08:59 AM, Michael Mysinger via Python-Dev wrote: > >> I am saying that if os.path.join now accepts RichPath objects, and those >> objects can return either str or bytes, then its much harder to reason >> about >> when I have all bytes or all strings. In essence, you will force me to >> pre- >> wrap all RichPath objects in either os.fsencode(os.fspath(path)) or >> os.fsdecode(os.fspath(path)), just so I can reason about the type. And if >> I >> have to always do that wrapping then os.path.join doesn't need to accept >> RichPath objects and call fspath at all. > > > What many folks seem to be missing is that *you* (generic you) have control > of your data. > > If you are not working at the bytes layer, you shouldn't be getting bytes > objects because: > > - you specified str when asking for data from the OS, or > - you transformed the incoming bytes from whatever external source > to str when you received them. My experience is that (particularly with code that was originally written for Python 2) "you have control of your data" is often an illusion - bytes can appear in code from unexpected sources, and when they do I'd rather see an error if I'm using code where I expect a string. Certainly that's a bug in the code - all I'm saying is that it fail early rather than late. Having said this, I don't have an actual use case - but equally it seems to me that our problem is that *nobody* does (yet) because uptake of pathlib has been slow, thanks to limited stdlib support. My view remains that we should get the (relatively simple and uncontroversial) str support in place, and defer bytes support for when we have experience with that. I'd appreciate it if anyone can clarify why "gracefully extending" the protocol to include bytes support at a later date isn't practical. Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 04/14/2016 08:59 AM, Michael Mysinger via Python-Dev wrote: I am saying that if os.path.join now accepts RichPath objects, and those objects can return either str or bytes, then its much harder to reason about when I have all bytes or all strings. In essence, you will force me to pre- wrap all RichPath objects in either os.fsencode(os.fspath(path)) or os.fsdecode(os.fspath(path)), just so I can reason about the type. And if I have to always do that wrapping then os.path.join doesn't need to accept RichPath objects and call fspath at all. What many folks seem to be missing is that *you* (generic you) have control of your data. If you are not working at the bytes layer, you shouldn't be getting bytes objects because: - you specified str when asking for data from the OS, or - you transformed the incoming bytes from whatever external source to str when you received them. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 04/14/2016 09:09 AM, Victor Stinner wrote: 2016-04-14 16:54 GMT+02:00 Ethan Furman: I consider that the final goal of the whole discussion is to support something like: path = os.path.join(pathlib_path, "str_path", direntry) (...) I expect that DirEntry.__fspath__ uses os.fsdecode() to return str, just to make my life easier. This would be where we strongly disagree. FYI it's ok that we disagree on this point, at least I expressed my opinion ;-) Absolutely. I appreciate you explaining your point of view. At least, we now identified better a point of disagreement. Agreed. :) ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
Donald Stufft stufft.io> writes: > > On Apr 14, 2016, at 11:59 AM, Michael Mysinger via Python-Dev python.org> wrote: > > > > In essence, you will force me to pre- > > wrap all RichPath objects in either os.fsencode(os.fspath(path)) or > > os.fsdecode(os.fspath(path)), just so I can reason about the type. > > This is only the case if you have a singular RichPath object that can represent both bytes and str (which is > what DirEntry does, which I agree makes it harder… but that’s already the case with DirEntry.path). > However that’s not the case if you have a bRichPath and uRichPath. And you might even be able to retain your sanity if you enforce any particular class to be either bRichPath or uRichPath. But if you do that, then that still leaves DirEntry out in the cold, likely converting to str in its __fspath__. Which leaves me in the camp that bRichPath falls under YAGNI, and RichPath should be str only. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Thu, Apr 14, 2016, at 12:05, Stephen J. Turnbull wrote: > Random832 writes: > > > And what such incompatibilities exist between bytes and str for the > > purpose of representing file paths? > > A plethora of encodings. Only one encoding, fsencode/fsdecode. All other encodings are not for filenames. > > At the end of the day, there's exactly one answer to "what file on > > disk this represents (or would represent if it existed)". > > Nope. Suppose those bytes were read from a file or a socket? It's > dangerous to assume that encoding matches the file system's. Why can I pass them to os.open, then, or to os.path.join so long as everything else is also bytes? On UNIX, the filesystem is in bytes, so saying that bytes can't match the filesystem is absurd. Converting it to str with fsdecode will *always, absolutely, 100% of the time* give a str that will address the same file that the bytes does (even if it's "dangerous" to assume that was the name the user wanted, that's beyond the scope of what the module is capable of dealing with). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
> On Apr 14, 2016, at 11:59 AM, Michael Mysinger via Python-Dev >wrote: > > In essence, you will force me to pre- > wrap all RichPath objects in either os.fsencode(os.fspath(path)) or > os.fsdecode(os.fspath(path)), just so I can reason about the type. This is only the case if you have a singular RichPath object that can represent both bytes and str (which is what DirEntry does, which I agree makes it harder… but that’s already the case with DirEntry.path). However that’s not the case if you have a bRichPath and uRichPath. - Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA signature.asc Description: Message signed with OpenPGP using GPGMail ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
2016-04-14 17:29 GMT+02:00 Ethan Furman: > Interoperability with other systems and/or libraries. If we use > surrogateescape to transform str to bytes, and the other side does not, we > no longer have a workable path. I guess that you mean a Python library? When you exchange with external programs or call a C libraries, Python is responsible to encode Unicode to bytes with os.fsencode(). The external part is not aware that Python uses surrogateescape, it gets "regular" bytes. I suggest to consider such Python library as external programs and libraries: convert Unicode to bytes with os.fsencode(), but also process paths as Unicode "inside" your application. It's the basic rule to handle correctly Unicode in an application: decode inputs as soon as possible, and encode back as late as possible. Encode/decode at borders. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
2016-04-14 16:54 GMT+02:00 Ethan Furman: >> I consider that the final goal of the whole discussion is to support >> something like: >> >> path = os.path.join(pathlib_path, "str_path", direntry) >> >> (...) >> I expect that DirEntry.__fspath__ uses os.fsdecode() to return str, >> just to make my life easier. > > This would be where we strongly disagree. FYI it's ok that we disagree on this point, at least I expressed my opinion ;-) At least, we now identified better a point of disagreement. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
Random832 writes: > And what such incompatibilities exist between bytes and str for the > purpose of representing file paths? A plethora of encodings. > At the end of the day, there's exactly one answer to "what file on > disk this represents (or would represent if it existed)". Nope. Suppose those bytes were read from a file or a socket? It's dangerous to assume that encoding matches the file system's. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
Ethan Furman stoneleaf.us> writes: > On 04/14/2016 12:03 AM, Michael Mysinger via Python-Dev wrote: > > In particular, one RichPath > > class might return bytes and another str, or even worse the same class might > > sometimes return bytes and sometimes str. When will os.path.join blow up due > > to mixing bytes and str and when will it work in those situations? > > What are you asking here? ... Meaning allowing os.fspath() > and __fspath__ to return either bytes or str will never cause the > combination of bytes and str to work. Said another way: if you are > using os.path.join then all the pieces have be str or all the pieces > have to be bytes. I am saying that if os.path.join now accepts RichPath objects, and those objects can return either str or bytes, then its much harder to reason about when I have all bytes or all strings. In essence, you will force me to pre- wrap all RichPath objects in either os.fsencode(os.fspath(path)) or os.fsdecode(os.fspath(path)), just so I can reason about the type. And if I have to always do that wrapping then os.path.join doesn't need to accept RichPath objects and call fspath at all. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 04/14/2016 07:01 AM, Random832 wrote: On Thu, Apr 14, 2016, at 09:50, Chris Angelico wrote: Adding integers and floats is considered "safe" because most people's use of floats completely compasses their use of ints. (You'll get OverflowError if it can't be represented.) But float and Decimal are considered "unsafe": --> 1.5 + decimal.Decimal("1.5") Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for +: 'float' and 'decimal.Decimal' This is more what's happening here. Floats and Decimals can represent similar sorts of things, but with enough incompatibilities that you can't simply merge them. And what such incompatibilities exist between bytes and str for the purpose of representing file paths? At the end of the day, there's exactly one answer to "what file on disk this represents (or would represent if it existed)". Interoperability with other systems and/or libraries. If we use surrogateescape to transform str to bytes, and the other side does not, we no longer have a workable path. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 04/14/2016 06:56 AM, Victor Stinner wrote: 2016-04-14 15:40 GMT+02:00 Nick Coghlan: >> Even earlier, Victor Stinner wrote: I consider that the final goal of the whole discussion is to support something like: path = os.path.join(pathlib_path, "str_path", direntry) That's not a *new* problem though, it already exists if you pass in a mix of bytes and str: (...) There's also already a solution (regardless of whether you want bytes or str as the result), which is to explicitly coerce all the arguments to the same type: --> os.path.join(*map(os.fsdecode, ("str", b"bytes"))) (...) I don't understand. What is the point of adding a new __fspath__ protocol to *implicitly* convert path objects to strings, if you still have to use an explicit conversion? That's the crux of the issue -- some of us think the job of __fspath__ is to simply retrieve the inherent data from the pathy object, *not* to do any implicit conversions. I would really expect that a high-level API like pathlib would solve encodings issues for me. IMHO DirEntry entries created by os.scandir(bytes) must use os.fsdecode() in their __fspath__ method. Then let pathlib do it. As a high-level interface I have no issue with pathlib converting DirEntry bytes objects to str using fsdecode (or whatever makes sense); os.path.join (and by extension os.fspath and __fspath__) should do no such thing. os.path.join(*map(os.fsdecode, ("str", b"bytes"))) This code is quite complex for a newbie, don't you think so? A newbie should be using pathlib. If pathlib is not low-level enough, then the newbie needs to learn about low-level stuff. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 04/14/2016 05:16 AM, Victor Stinner wrote: I consider that the final goal of the whole discussion is to support something like: path = os.path.join(pathlib_path, "str_path", direntry) Even if direntry uses a bytes filename. I expect genericpath.join() to be patched to use os.fspath(). If os.fspath() returns bytes, path.join() will fail with an annoying TypeError. I expect that DirEntry.__fspath__ uses os.fsdecode() to return str, just to make my life easier. This would be where we strongly disagree. If pathlib, as a high-level construct, wants to take that approach I have no issues, but the functions in os are low-level and as such should not be changing data types unless I ask for it. I see __fspath__ as a retrieval mechanism, not a data-transformation mechanism. You can apply the same rationale for the flavors 2 and 3 (os.fspath(path, allow_bytes=True)). Indirectly, you will get similar TypeError on os.path.join(). And that's fine. Low-level interfaces should not change data types unless explicitly requested -- and we have fsencode() and fsdecode() for that. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 04/14/2016 12:03 AM, Michael Mysinger via Python-Dev wrote: Brett Cannon writes: After playing with and considering the 4 possibilities, anything where __fspath__ can return bytes seems like insanity that flies in the face of everything Python 3 is trying to accomplish. In particular, one RichPath class might return bytes and another str, or even worse the same class might sometimes return bytes and sometimes str. When will os.path.join blow up due to mixing bytes and str and when will it work in those situations? What are you asking here? Exactly where in os.join mixing bytes & str the exception will occur, or will mixing bytes & str ever work? The answer to the first is irrelevant (except for performance). The answer to the second is always/never. Meaning allowing os.fspath() and __fspath__ to return either bytes or str will never cause the combination of bytes and str to work. Said another way: if you are using os.path.join then all the pieces have be str or all the pieces have to be bytes. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
2016-04-14 15:40 GMT+02:00 Nick Coghlan: >> I consider that the final goal of the whole discussion is to support >> something like: >> >> path = os.path.join(pathlib_path, "str_path", direntry) > > That's not a *new* problem though, it already exists if you pass in a > mix of bytes and str: > (...) > There's also already a solution (regardless of whether you want bytes > or str as the result), which is to explicitly coerce all the arguments > to the same type: > os.path.join(*map(os.fsdecode, ("str", b"bytes"))) > (...) I don't understand. What is the point of adding a new __fspath__ protocol to *implicitly* convert path objects to strings, if you still have to use an explicit conversion? I would really expect that a high-level API like pathlib would solve encodings issues for me. IMHO DirEntry entries created by os.scandir(bytes) must use os.fsdecode() in their __fspath__ method. os.path.join() is just one example of an operation on multiple paths. Look at os.path for other example ;-) > os.path.join(*map(os.fsdecode, ("str", b"bytes"))) This code is quite complex for a newbie, don't you think so? My example was os.path.join(pathlib_path, "str_path", direntry) where we can do something to make the API easier to use. I don't propose to do anything for os.path.join("str", b"bytes") which would continue to fail with TypeError, *as expected*. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Thu, Apr 14, 2016, at 09:50, Chris Angelico wrote: > Adding integers and floats is considered "safe" because most people's > use of floats completely compasses their use of ints. (You'll get > OverflowError if it can't be represented.) But float and Decimal are > considered "unsafe": > > >>> 1.5 + decimal.Decimal("1.5") > Traceback (most recent call last): > File "", line 1, in > TypeError: unsupported operand type(s) for +: 'float' and > 'decimal.Decimal' > > This is more what's happening here. Floats and Decimals can represent > similar sorts of things, but with enough incompatibilities that you > can't simply merge them. And what such incompatibilities exist between bytes and str for the purpose of representing file paths? At the end of the day, there's exactly one answer to "what file on disk this represents (or would represent if it existed)". ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Thu, Apr 14, 2016, at 09:40, Nick Coghlan wrote: > That's not a *new* problem though, it already exists if you pass in a > mix of bytes and str: > > There's also already a solution (regardless of whether you want bytes > or str as the result), which is to explicitly coerce all the arguments > to the same type: It'd be nice if that went away. Having to do that makes about as much sense to me as if you had to explicitly coerce an int to a float to add them together. Sure, explicit is better than implicit, but there are limits. You're explicitly calling os.path.join; isn't that explicit enough? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Thu, Apr 14, 2016 at 11:45 PM, Random832wrote: > On Thu, Apr 14, 2016, at 09:40, Nick Coghlan wrote: >> That's not a *new* problem though, it already exists if you pass in a >> mix of bytes and str: >> >> There's also already a solution (regardless of whether you want bytes >> or str as the result), which is to explicitly coerce all the arguments >> to the same type: > > It'd be nice if that went away. Having to do that makes about as much > sense to me as if you had to explicitly coerce an int to a float to add > them together. Sure, explicit is better than implicit, but there are > limits. You're explicitly calling os.path.join; isn't that explicit > enough? Adding integers and floats is considered "safe" because most people's use of floats completely compasses their use of ints. (You'll get OverflowError if it can't be represented.) But float and Decimal are considered "unsafe": >>> 1.5 + decimal.Decimal("1.5") Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for +: 'float' and 'decimal.Decimal' This is more what's happening here. Floats and Decimals can represent similar sorts of things, but with enough incompatibilities that you can't simply merge them. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 14 April 2016 at 22:16, Victor Stinnerwrote: > 2016-04-13 19:10 GMT+02:00 Brett Cannon : >> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the >> four potential approaches implemented (although it doesn't follow the >> "separate functions" approach some are proposing and instead goes with the >> allow_bytes approach I originally proposed). > > IMHO the best argument against the flavor 4 (fspath: str or bytes > allowed) is the os.path.join() function. > > I consider that the final goal of the whole discussion is to support > something like: > > path = os.path.join(pathlib_path, "str_path", direntry) That's not a *new* problem though, it already exists if you pass in a mix of bytes and str: >>> import os.path >>> os.path.join("str", b"bytes") Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python3.4/posixpath.py", line 89, in join "components") from None TypeError: Can't mix strings and bytes in path components There's also already a solution (regardless of whether you want bytes or str as the result), which is to explicitly coerce all the arguments to the same type: >>> os.path.join(*map(os.fsdecode, ("str", b"bytes"))) 'str/bytes' >>> os.path.join(*map(os.fsencode, ("str", b"bytes"))) b'str/bytes' Assuming os.fsdecode and os.fsencode are updated to call os.fspath on their argument before continuing with the current logic, the latter two forms would both start automatically handling both DirEntry and pathlib objects, while the first form would continue to throw TypeError if handed an unexpected bytes value (whether directly or via an __fspath__ call). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
2016-04-13 19:10 GMT+02:00 Brett Cannon: > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the > four potential approaches implemented (although it doesn't follow the > "separate functions" approach some are proposing and instead goes with the > allow_bytes approach I originally proposed). IMHO the best argument against the flavor 4 (fspath: str or bytes allowed) is the os.path.join() function. I consider that the final goal of the whole discussion is to support something like: path = os.path.join(pathlib_path, "str_path", direntry) Even if direntry uses a bytes filename. I expect genericpath.join() to be patched to use os.fspath(). If os.fspath() returns bytes, path.join() will fail with an annoying TypeError. I expect that DirEntry.__fspath__ uses os.fsdecode() to return str, just to make my life easier. I recall that I used to say that Python 2 doesn't support Unicode filenames because os.path.join() raises a UnicodeDecodeError when you try to join a Unicode filename with a byte filename which contains non-ASCII bytes. The problem occurs indirectly in code using hardcoded paths, Unicode or bytes paths. Saying that "Python 2 doesn't support Unicode filenames" is wrong, but since Unicode is an hard problem, I tried to simplify my explanation :-) You can apply the same rationale for the flavors 2 and 3 (os.fspath(path, allow_bytes=True)). Indirectly, you will get similar TypeError on os.path.join(). Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
Brett Cannon python.org> writes: > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the four potential approaches implemented (although it doesn't follow the "separate functions" approach some are proposing and instead goes with the allow_bytes approach I originally proposed). > Thanks Brett, it is definitely a start! Maybe I am just more unimaginative than most, but since interoperability is the goal, I would ideally be able to play with a full implementation where all the stdlib functions Nick originally mentioned accepted these "rich path" objects. However, for concrete example purposes, maybe it is sufficient to start with your fspath function, a toy RichPath class implementing __fspath__, and something like os.path.join, which is a meaty enough example to test some of the functionality. I posted a gist of a string only example at https://gist.github.com/mmysinger/0b5ae2cfb866f7013c387a2683c7fc39 After playing with and considering the 4 possibilities, anything where __fspath__ can return bytes seems like insanity that flies in the face of everything Python 3 is trying to accomplish. In particular, one RichPath class might return bytes and another str, or even worse the same class might sometimes return bytes and sometimes str. When will os.path.join blow up due to mixing bytes and str and when will it work in those situations? So for me that eliminates #3 and #4. Also the version #2 accepting bytes in os.fspath felt like it could be a very minor convenience, but even the str only version #1 is just requires one isinstance check in the rare case you need to also deal with bytes (see the os.path.join example in the gist above). So I lean toward the str only #1 version. In any case I would start with the strict str only full implementation and loosen it either in 3.6 or 3.7 depending on what people think after actually using it. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 14 April 2016 at 14:05, Random832wrote: > On Wed, Apr 13, 2016, at 23:27, Nick Coghlan wrote: >> In this kind of case, inheritance tends to trump protocol. For >> example, int subclasses can't override operator.index: > ... >> The reasons for that behaviour are more pragmatic than philosophical: >> builtins and their subclasses are extensively special-cased for speed >> reasons, and those shortcuts are encountered before the interpreter >> even considers using the general protocol. >> >> In cases where the magic method return types are polymorphic (so >> subclasses may want to override them) we'll use more restrictive exact >> type checks for the shortcuts, but that argument doesn't apply for >> typechecked protocols where the result is required to be an instance >> of a particular builtin type (but subclasses are considered >> acceptable). > > Then why aren't we doing it for str? Because "try: path = > path.__fspath__()" is more idiomatic than the alternative? The sketches Brett posted will bear little resemblance to the actual implementation - that will be in C and use similar idioms to those we use for other abstract protocols (such as shortcuts for instances of builtin types, and doing the method lookup via the passed in object's type, rather than on the instance). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Wed, Apr 13, 2016, at 23:27, Nick Coghlan wrote: > In this kind of case, inheritance tends to trump protocol. For > example, int subclasses can't override operator.index: ... > The reasons for that behaviour are more pragmatic than philosophical: > builtins and their subclasses are extensively special-cased for speed > reasons, and those shortcuts are encountered before the interpreter > even considers using the general protocol. > > In cases where the magic method return types are polymorphic (so > subclasses may want to override them) we'll use more restrictive exact > type checks for the shortcuts, but that argument doesn't apply for > typechecked protocols where the result is required to be an instance > of a particular builtin type (but subclasses are considered > acceptable). Then why aren't we doing it for str? Because "try: path = path.__fspath__()" is more idiomatic than the alternative? If some sort of reasoned decision has been made to require the protocol to trump the special case for str subclasses, it's unreasonable not to apply the same decision to bytes subclasses. The decision should be "always use the protocol first" or "always use the type match first". In other words, why not this: def fspath(path, *, allow_bytes=False): if isinstance(path, (bytes, str) if allow_bytes else str) return path try: m = path.__fspath__ except AttributeError: raise TypeError path = m() if isinstance(path, (bytes, str) if allow_bytes else str) return path raise TypeError ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 14 April 2016 at 13:14, Ethan Furmanwrote: > On 04/13/2016 07:57 PM, Nikolaus Rath wrote: >> Either I haven't understood your answer, or you haven't understood my >> question. I'm concerned about this case: >> >>class Special(bytes): >>def __fspath__(self): >> return 'str-val' >>obj = Special('bytes-val', 'utf8') >>path_obj = fspath(obj, allow_bytes=True) >> >> With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'. > > I misunderstood your question. That is... an interesting case. ;) In this kind of case, inheritance tends to trump protocol. For example, int subclasses can't override operator.index: >>> from operator import index >>> class NotAnInt(): ... def __index__(self): ... return 42 ... >>> index(NotAnInt()) 42 >>> class MyInt(int): ... def __index__(self): ... return 42 ... >>> index(MyInt(53)) 53 The reasons for that behaviour are more pragmatic than philosophical: builtins and their subclasses are extensively special-cased for speed reasons, and those shortcuts are encountered before the interpreter even considers using the general protocol. In cases where the magic method return types are polymorphic (so subclasses may want to override them) we'll use more restrictive exact type checks for the shortcuts, but that argument doesn't apply for typechecked protocols where the result is required to be an instance of a particular builtin type (but subclasses are considered acceptable). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 04/13/2016 07:57 PM, Nikolaus Rath wrote: On Apr 13 2016, Ethan Furman wrote: On 04/13/2016 03:45 PM, Nikolaus Rath wrote: When passing an object that is of type str and has a __fspath__ attribute, all approaches return the value of __fspath__(). However, when passing something of type bytes, the second approach returns the object, while the third returns the value of __fspath__(). Is this intentional? I think a __fspath__ attribute should always be preferred. Yes, it is intentional. The second approach assumes __fspath__ can only contain str, so there is no point in checking it for bytes. Either I haven't understood your answer, or you haven't understood my question. I'm concerned about this case: class Special(bytes): def __fspath__(self): return 'str-val' obj = Special('bytes-val', 'utf8') path_obj = fspath(obj, allow_bytes=True) With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'. I misunderstood your question. That is... an interesting case. ;) -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Apr 13 2016, Ethan Furmanwrote: > On 04/13/2016 03:45 PM, Nikolaus Rath wrote: > >> When passing an object that is of type str and has a __fspath__ >> attribute, all approaches return the value of __fspath__(). >> >> However, when passing something of type bytes, the second approach >> returns the object, while the third returns the value of __fspath__(). >> >> Is this intentional? I think a __fspath__ attribute should always be >> preferred. > > Yes, it is intentional. The second approach assumes __fspath__ can > only contain str, so there is no point in checking it for bytes. Either I haven't understood your answer, or you haven't understood my question. I'm concerned about this case: class Special(bytes): def __fspath__(self): return 'str-val' obj = Special('bytes-val', 'utf8') path_obj = fspath(obj, allow_bytes=True) With #2, path_obj == 'bytes-val'. With #3, path_obj == 'str-val'. I would expect that fspath(obj, allow_bytes=True) == 'str-val' (after all, it's allow_bytes, not require_bytes). Bu Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.« ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Apr 13, 2016 19:06, Brett Cannonwrote: > On Wed, 13 Apr 2016 at 15:46 Nikolaus Rath wrote: >> When passing an object that is of type str and has a __fspath__ >> attribute, all approaches return the value of __fspath__(). >> >> However, when passing something of type bytes, the second approach >> returns the object, while the third returns the value of __fspath__(). >> >> Is this intentional? I think a __fspath__ attribute should always be >> preferred. > > > It's very much intentional. If we define __fspath__() to only return strings > but still want to minimize boilerplate of allowing bytes to simply pass > through without checking a path argument to see if it is bytes then approach > #2 is warranted. But if __fspath__() can return bytes then approach #3 allows > for it. Er, the difference comes in when the object passed to os.fspath is a subclass of bytes that, itself, has a __fspath__ method (which may return a str). It's unlikely to occur in the wild, but is a semantic difference between this case and all other objects with __fspath__ methods. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Apr 13, 2016 20:06, Chris Barkerwrote: > > In this case, I don't know that we need to be tolerant of buggy __fspathname__() implementations -- they should be tested outside these checks, and not be buggy. So a buggy implementation may raise and may be ignored, depending on what Exception the bug triggers -- big deal. The only time it would matter is when the implementer is debugging the implementation. > > -CHB Yes but you can often, and can in this case, restrict the contents of the try block to a single operation - a name access, an attribute, a subscript - and that sharply limits the risk of such a thing happening. Sure, the object's __getattr(ibute)__ could still fail from something deep inside it missing a different attribute, but that's it. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 04/13/2016 05:06 PM, Chris Barker wrote: In this case, I don't know that we need to be tolerant of buggy __fspathname__() implementations -- they should be tested outside these checks, and not be buggy. So a buggy implementation may raise and may be ignored, depending on what Exception the bug triggers -- big deal. The only time it would matter is when the implementer is debugging the implementation. Yet the idea behind robust exception handling is to test as little as possible and only catch what you know how to correct. This code catches only one thing, only at one place, and we know how to deal with it: try: fsp = obj.__fspath__ except AttributeError: pass else: fsp = fsp() Contrarily, this next code catches the same error, but it could happen at the one place we know how to deal with it *or* anywhere further down the call stack where we have no clue what the proper course is to handle the problem... yet we suppress it anyway: try: fsp = obj.__fspath__() except AttributeError: pass Certainly not code I want to see in the stdlib. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Wed, Apr 13, 2016 at 1:47 PM, Random832wrote: > On Wed, Apr 13, 2016, at 16:39, Chris Barker wrote: > > so are we worried that __fspath__ will exist and be callable, but might > > raise an AttributeError somewhere inside itself? if so isn't it broken > > anyway, so should it be ignored? > > Well, if you're going to say "ignore the protocol because it's broken", > where do you stop? What if it raises some other exception? What if it > raises SystemExit? this is pretty much always the case with EAFTP coding: try: something() except SomeError: do_something_else() unless SomeError is a custom defined error that you know is never going to get raised anywhere else, then something() could raise SomeError for the reason you expect, or some code deep in the call stack could raise SomeError also, and you wouldn't know that. I had a student run into this and it took him a good while to debug it. But that was because the code in something() was pretty darn buggy. If he had tested something() by itself, there would have been no issue finding the problem. In this case, I don't know that we need to be tolerant of buggy __fspathname__() implementations -- they should be tested outside these checks, and not be buggy. So a buggy implementation may raise and may be ignored, depending on what Exception the bug triggers -- big deal. The only time it would matter is when the implementer is debugging the implementation. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Wed, 13 Apr 2016 at 15:20 Victor Stinnerwrote: > Oh, since others voted, I will also vote and explain my vote. > > I like choice 1, str only, because it's very well defined. In Python > 3, Unicode is simply the native type for text. It's accepted by almost > all functions. In other emails, I also explained that Unicode is fine > to store undecodable filenames on UNIX, it works as expected since > many years (since Python 3.3). > > -- > > If you cannot survive without bytes, I suggest to add two functions: > one for str only, another which can return str or bytes. > > Maybe you want in fact two protocols: __fspath__(str only) and > __fspathb__ (bytes only)? os.fspathb() would first try __fspathb__, or > fallback to os.fsencode(__fspath__). os.fspath() would first try > __fspath__, or fallback to os.fsdecode(__fspathb__). IMHO it's not > worth to have such complexity while Unicode handles all use cases. > Implementing two magic methods for this seems like overkill. Best I would be willing to do with automatic encode/decode is use os.fsencode()/os.fsdecode() on the argument or what __fspath__() returned. > > Or do you know functions implemented in Python accepting str *and* bytes? > On purpose, nothing off the top of my head. > > -- > > The C implementation of the os module has an important > path_converter() function: > > * path_converter accepts (Unicode) strings and their > * subclasses, and bytes and their subclasses. What > * it does with the argument depends on the platform: > * > * * On Windows, if we get a (Unicode) string we > * extract the wchar_t * and return it; if we get > * bytes we extract the char * and return that. > * > * * On all other platforms, strings are encoded > * to bytes using PyUnicode_FSConverter, then we > * extract the char * from the bytes object and > * return that. > > This function will implement something like os.fspath(). > > With os.fspath() only accepting str, we will return directly the > Unicode string on Windows. On UNIX, Unicode will be encoded, as it's > already done for Unicode strings. > > This specific function would benefit of the flavor 4 (os.fspath() can > return str and bytes), but it's more an exception than the rule. I > would be more a micro-optimization than a good reason to drive the API > design. > Yep, it's interesting to know but Chris and I won't let it drive the decision (I assume). -Brett > > Victor > > Le mercredi 13 avril 2016, Brett Cannon a écrit : > > > > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 > has the four potential approaches implemented (although it doesn't follow > the "separate functions" approach some are proposing and instead goes with > the allow_bytes approach I originally proposed). > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Wed, 13 Apr 2016 at 15:46 Nikolaus Rathwrote: > On Apr 13 2016, Brett Cannon wrote: > > On Tue, 12 Apr 2016 at 22:38 Michael Mysinger via Python-Dev < > > python-dev@python.org> wrote: > > > >> Ethan Furman stoneleaf.us> writes: > >> > >> > Do we allow bytes to be returned from os.fspath()? If yes, then do we > >> > allow bytes from __fspath__()? > >> > >> De-lurking. Especially since the ultimate goal is better > interoperability, > >> I > >> feel like an implementation that people can play with would help guide > the > >> few remaining decisions. To help test the various options you could > >> temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to > >> both > >> pathlib.__fspath__() and os.fspath(), with distinct configurable > defaults > >> for > >> each. > >> > >> In the spirit of Python 3 I feel like bytes might not be needed in > >> practice, > >> but something like this with defaults of False will allow people to > easily > >> test all the various options. > >> > > > > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has > > the four potential approaches implemented (although it doesn't follow the > > "separate functions" approach some are proposing and instead goes with > the > > allow_bytes approach I originally proposed). > > > When passing an object that is of type str and has a __fspath__ > attribute, all approaches return the value of __fspath__(). > > However, when passing something of type bytes, the second approach > returns the object, while the third returns the value of __fspath__(). > > Is this intentional? I think a __fspath__ attribute should always be > preferred. > It's very much intentional. If we define __fspath__() to only return strings but still want to minimize boilerplate of allowing bytes to simply pass through without checking a path argument to see if it is bytes then approach #2 is warranted. But if __fspath__() can return bytes then approach #3 allows for it. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 04/13/2016 03:45 PM, Nikolaus Rath wrote: When passing an object that is of type str and has a __fspath__ attribute, all approaches return the value of __fspath__(). However, when passing something of type bytes, the second approach returns the object, while the third returns the value of __fspath__(). Is this intentional? I think a __fspath__ attribute should always be preferred. Yes, it is intentional. The second approach assumes __fspath__ can only contain str, so there is no point in checking it for bytes. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Apr 13 2016, Brett Cannonwrote: > On Tue, 12 Apr 2016 at 22:38 Michael Mysinger via Python-Dev < > python-dev@python.org> wrote: > >> Ethan Furman stoneleaf.us> writes: >> >> > Do we allow bytes to be returned from os.fspath()? If yes, then do we >> > allow bytes from __fspath__()? >> >> De-lurking. Especially since the ultimate goal is better interoperability, >> I >> feel like an implementation that people can play with would help guide the >> few remaining decisions. To help test the various options you could >> temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to >> both >> pathlib.__fspath__() and os.fspath(), with distinct configurable defaults >> for >> each. >> >> In the spirit of Python 3 I feel like bytes might not be needed in >> practice, >> but something like this with defaults of False will allow people to easily >> test all the various options. >> > > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has > the four potential approaches implemented (although it doesn't follow the > "separate functions" approach some are proposing and instead goes with the > allow_bytes approach I originally proposed). When passing an object that is of type str and has a __fspath__ attribute, all approaches return the value of __fspath__(). However, when passing something of type bytes, the second approach returns the object, while the third returns the value of __fspath__(). Is this intentional? I think a __fspath__ attribute should always be preferred. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.« ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
Oh, since others voted, I will also vote and explain my vote. I like choice 1, str only, because it's very well defined. In Python 3, Unicode is simply the native type for text. It's accepted by almost all functions. In other emails, I also explained that Unicode is fine to store undecodable filenames on UNIX, it works as expected since many years (since Python 3.3). -- If you cannot survive without bytes, I suggest to add two functions: one for str only, another which can return str or bytes. Maybe you want in fact two protocols: __fspath__(str only) and __fspathb__ (bytes only)? os.fspathb() would first try __fspathb__, or fallback to os.fsencode(__fspath__). os.fspath() would first try __fspath__, or fallback to os.fsdecode(__fspathb__). IMHO it's not worth to have such complexity while Unicode handles all use cases. Or do you know functions implemented in Python accepting str *and* bytes? -- The C implementation of the os module has an important path_converter() function: * path_converter accepts (Unicode) strings and their * subclasses, and bytes and their subclasses. What * it does with the argument depends on the platform: * * * On Windows, if we get a (Unicode) string we * extract the wchar_t * and return it; if we get * bytes we extract the char * and return that. * * * On all other platforms, strings are encoded * to bytes using PyUnicode_FSConverter, then we * extract the char * from the bytes object and * return that. This function will implement something like os.fspath(). With os.fspath() only accepting str, we will return directly the Unicode string on Windows. On UNIX, Unicode will be encoded, as it's already done for Unicode strings. This specific function would benefit of the flavor 4 (os.fspath() can return str and bytes), but it's more an exception than the rule. I would be more a micro-optimization than a good reason to drive the API design. Victor Le mercredi 13 avril 2016, Brett Cannona écrit : > > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the > four potential approaches implemented (although it doesn't follow the > "separate functions" approach some are proposing and instead goes with the > allow_bytes approach I originally proposed). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Wed, Apr 13, 2016, at 16:39, Chris Barker wrote: > so are we worried that __fspath__ will exist and be callable, but might > raise an AttributeError somewhere inside itself? if so isn't it broken > anyway, so should it be ignored? Well, if you're going to say "ignore the protocol because it's broken", where do you stop? What if it raises some other exception? What if it raises SystemExit? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Wed, 13 Apr 2016 at 13:40 Chris Barkerwrote: > so are we worried that __fspath__ will exist and be callable, but might > raise an AttributeError somewhere inside itself? if so isn't it broken > anyway, so should it be ignored? > It should propagate instead of swallowing up the exception, otherwise it's hard to debug why __fspath__ seems to be ignored. > > and I know it's asking permission rather than forgiveness, but what's > wrong with: > > if hasattr(path, "__fspath__"): > path = path.__fspath__() > > if you really want to check for the existence of the attribute first? > > Nothing. > or even: > > path = path.__fspath__ if hasattr(path, "__fspath__") else path > > That also works. > > (OK, really a Pythonic style question now) > Yes, this is getting a bit side-tracked over some example code to just get a concept across. -Brett > > -CHB > > > > On Wed, Apr 13, 2016 at 12:54 PM, Brett Cannon wrote: > >> >> >> On Wed, 13 Apr 2016 at 12:39 Fred Drake wrote: >> >>> On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelico >>> wrote: >>> > Is that the intention, or should the exception catching be narrower? I >>> > know it's clunky to write it in Python, but AIUI it's less so in C: >>> > >>> > try: >>> > callme = path.__fspath__ >>> > except AttributeError: >>> > pass >>> > else: >>> > path = callme() >>> >>> +1 for this variant; I really don't like masking errors inside the >>> __fspath__ implementation. >>> >> >> Don't read too much into the code in that gist. I just did them quickly >> to get the point across of the proposals in terms of str/bytes, not what >> will be proposed in any final patch. >> >> ___ >> Python-Dev mailing list >> Python-Dev@python.org >> https://mail.python.org/mailman/listinfo/python-dev >> > Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov >> >> > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR(206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
so are we worried that __fspath__ will exist and be callable, but might raise an AttributeError somewhere inside itself? if so isn't it broken anyway, so should it be ignored? and I know it's asking poermission rather than forgiveness, but what's wrong with: if hasattr(path, "__fspath__"): path = path.__fspath__() if you really want to check for the existence of the attribute first? or even: path = path.__fspath__ if hasattr(path, "__fspath__") else path (OK, really a Pythonic style question now) -CHB On Wed, Apr 13, 2016 at 12:54 PM, Brett Cannonwrote: > > > On Wed, 13 Apr 2016 at 12:39 Fred Drake wrote: > >> On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelico wrote: >> > Is that the intention, or should the exception catching be narrower? I >> > know it's clunky to write it in Python, but AIUI it's less so in C: >> > >> > try: >> > callme = path.__fspath__ >> > except AttributeError: >> > pass >> > else: >> > path = callme() >> >> +1 for this variant; I really don't like masking errors inside the >> __fspath__ implementation. >> > > Don't read too much into the code in that gist. I just did them quickly to > get the point across of the proposals in terms of str/bytes, not what will > be proposed in any final patch. > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Wed, 13 Apr 2016 at 12:39 Fred Drakewrote: > On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelico wrote: > > Is that the intention, or should the exception catching be narrower? I > > know it's clunky to write it in Python, but AIUI it's less so in C: > > > > try: > > callme = path.__fspath__ > > except AttributeError: > > pass > > else: > > path = callme() > > +1 for this variant; I really don't like masking errors inside the > __fspath__ implementation. > Don't read too much into the code in that gist. I just did them quickly to get the point across of the proposals in terms of str/bytes, not what will be proposed in any final patch. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Thu, Apr 14, 2016 at 5:46 AM, Random832wrote: > On Wed, Apr 13, 2016, at 15:24, Chris Angelico wrote: >> Is that the intention, or should the exception catching be narrower? I >> know it's clunky to write it in Python, but AIUI it's less so in C: > > How is it less so in C? You lose the ability to PyObject_CallMethod. I might be wrong, then. Wasn't sure how it was all implemented. Anyway, it's a correctness thing, not a simplicity one, so even if it is clunkier, it ought to be the case. And that is the intention, so we're fine. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Wed, Apr 13, 2016, at 15:24, Chris Angelico wrote: > Is that the intention, or should the exception catching be narrower? I > know it's clunky to write it in Python, but AIUI it's less so in C: How is it less so in C? You lose the ability to PyObject_CallMethod. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 4/13/2016 13:49, Ethan Furman wrote: Number 3: it allows bytes, but only when told it's okay to do so. Having code get a bytes object when one is not expected is not a headache we need to inflict on anyone. This is an artifact of the other needless restrictions I said I wouldn't rant about. I think it is in the best interest not to perpetuate those needless restrictions. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Thu, Apr 14, 2016 at 5:30 AM, Brett Cannonwrote: > > > On Wed, 13 Apr 2016 at 12:25 Chris Angelico wrote: >> >> On Thu, Apr 14, 2016 at 3:10 AM, Brett Cannon wrote: >> > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has >> > the >> > four potential approaches implemented (although it doesn't follow the >> > "separate functions" approach some are proposing and instead goes with >> > the >> > allow_bytes approach I originally proposed). >> >> All of them have this construct: >> >> try: >> path = path.__fspath__() >> except AttributeError: >> pass >> >> Is that the intention, or should the exception catching be narrower? I >> know it's clunky to write it in Python, but AIUI it's less so in C: >> >> try: >> callme = path.__fspath__ >> except AttributeError: >> pass >> else: >> path = callme() > > > I'm assuming the C code will do what you're suggesting. My way is just > faster to write in 2 minutes of coding. :) Cool cool. Just checking! You're already aware that my preference is for the first one, str-only. I don't think the second one has much value (a path-like object can only ever return a str, but a bytes can be passed through unchanged?), and the fourth strikes me as a bad idea (just allowing bytes any time). So my votes are +1, -0.5, +0, -1. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Wed, Apr 13, 2016 at 3:24 PM, Chris Angelicowrote: > Is that the intention, or should the exception catching be narrower? I > know it's clunky to write it in Python, but AIUI it's less so in C: > > try: > callme = path.__fspath__ > except AttributeError: > pass > else: > path = callme() +1 for this variant; I really don't like masking errors inside the __fspath__ implementation. -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Wed, 13 Apr 2016 at 12:25 Chris Angelicowrote: > On Thu, Apr 14, 2016 at 3:10 AM, Brett Cannon wrote: > > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 > has the > > four potential approaches implemented (although it doesn't follow the > > "separate functions" approach some are proposing and instead goes with > the > > allow_bytes approach I originally proposed). > > All of them have this construct: > > try: > path = path.__fspath__() > except AttributeError: > pass > > Is that the intention, or should the exception catching be narrower? I > know it's clunky to write it in Python, but AIUI it's less so in C: > > try: > callme = path.__fspath__ > except AttributeError: > pass > else: > path = callme() > I'm assuming the C code will do what you're suggesting. My way is just faster to write in 2 minutes of coding. :) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Thu, Apr 14, 2016 at 3:10 AM, Brett Cannonwrote: > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the > four potential approaches implemented (although it doesn't follow the > "separate functions" approach some are proposing and instead goes with the > allow_bytes approach I originally proposed). All of them have this construct: try: path = path.__fspath__() except AttributeError: pass Is that the intention, or should the exception catching be narrower? I know it's clunky to write it in Python, but AIUI it's less so in C: try: callme = path.__fspath__ except AttributeError: pass else: path = callme() ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
Brett Cannon python.org> writes: > In the spirit of Python 3 I feel like bytes might not be needed in practice, > but something like this with defaults of False will allow people to easily > test all the various options. > > > > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the four potential approaches implemented (although it doesn't follow the "separate functions" approach some are proposing and instead goes with the allow_bytes approach I originally proposed). Either number 1 or number 3 for me (I don't think bytes path-like objects are useful in Python). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 04/13/2016 10:22 AM, Alexander Walters wrote: On 4/13/2016 13:10, Brett Cannon wrote: https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the four potential approaches implemented (although it doesn't follow the "separate functions" approach some are proposing and instead goes with the allow_bytes approach I originally proposed). Number 4 is my personal favorite - it has a simple control flow path and is the least needlessly restrictive. Number 3: it allows bytes, but only when told it's okay to do so. Having code get a bytes object when one is not expected is not a headache we need to inflict on anyone. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 4/13/2016 13:10, Brett Cannon wrote: https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the four potential approaches implemented (although it doesn't follow the "separate functions" approach some are proposing and instead goes with the allow_bytes approach I originally proposed). Number 4 is my personal favorite - it has a simple control flow path and is the least needlessly restrictive. (I could rant about needless restrictions, but I am about a decade late for that, so I wont bother.) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Tue, 12 Apr 2016 at 22:38 Michael Mysinger via Python-Dev < python-dev@python.org> wrote: > Ethan Furman stoneleaf.us> writes: > > > Do we allow bytes to be returned from os.fspath()? If yes, then do we > > allow bytes from __fspath__()? > > De-lurking. Especially since the ultimate goal is better interoperability, > I > feel like an implementation that people can play with would help guide the > few remaining decisions. To help test the various options you could > temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to > both > pathlib.__fspath__() and os.fspath(), with distinct configurable defaults > for > each. > > In the spirit of Python 3 I feel like bytes might not be needed in > practice, > but something like this with defaults of False will allow people to easily > test all the various options. > https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 has the four potential approaches implemented (although it doesn't follow the "separate functions" approach some are proposing and instead goes with the allow_bytes approach I originally proposed). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 13 April 2016 at 02:19, Chris Barkerwrote: > So: why use strings as the lingua franca of paths? i.e. the basis of the > path protocol. maybe we should support only two path representations: > > 1) A "proper" path object -- i.e. pathlib.Path or anything else that > supports the path protocol. > > 2) the bytes that the OS actually needs. > > this would mean that the protocol would be to have a __pathbytes__() method > that woulde return the bytes that should be passed off to the OS. The reason to favour strings over raw bytes for path manipulation is the same reason to favour them anywhere else: to avoid having to worry about encodings *while* you're manipulating things, and instead only worry about the encoding when actually talking to the OS (which may be UTF-16-LE to talk to a Windows API, or UTF-8 to talk to a *nix API, or something else entirely if your OS is set up that way, or you're writing the path to a file or network packet, rather than using it locally). Regardless of what we decide about os.fspath's return type, that general principle won't change - if you're manipulating bytes paths directly, you're doing something relatively specialised (like working on CPython's own os module). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
Ethan Furman stoneleaf.us> writes: > Do we allow bytes to be returned from os.fspath()? If yes, then do we > allow bytes from __fspath__()? De-lurking. Especially since the ultimate goal is better interoperability, I feel like an implementation that people can play with would help guide the few remaining decisions. To help test the various options you could temporarily add a _allow_bytes=GLOBAL_CONFIG_OPTION default argument to both pathlib.__fspath__() and os.fspath(), with distinct configurable defaults for each. In the spirit of Python 3 I feel like bytes might not be needed in practice, but something like this with defaults of False will allow people to easily test all the various options. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Tue, Apr 12, 2016 at 6:52 PM, Stephen J. Turnbullwrote: > > (A) Why does anybody need bytes out of a pathlib.Path (or other > __fspath__-toting, higher-level API) *inside* the boundary? Note > that the APIs in os (etc) *don't need* bytes because they are > already polymorphic. > Indeed not from pathlib.*Path , but from DirEntry, which may have a path as bytes. So the options for DirEntry (or things like Ethan's 'antipathy') are: (1) Provide bytes or str via the protocol, depending on which type this DirEntry has Downside: The protocol needs to support str and bytes. (2) Decode bytes using os.fsdecode and provide a str via the protocol Downside: The user passed in bytes and maybe had a reason to do so. This might lead to a weird mixture of str and bytes in the same code. (3) Do not implement the protocol when dealing with bytes Downside: If a function calling os.scandir accepts both bytes and str in a duck-typing fashion, then, if this adopted something that uses the new protocol, it will lose its bytes compatiblity. This risk might not be huge, so perhaps (3) is an option? > (B) If they do, why can't they just apply bytes() to the object? I > understand that that would offend Ethan's aesthetic sense, so it's > worth looking for a nice way around it. But allowing __fspath__ > to return bytes or str is hideous, because Paths are clearly on > the application side of the boundary. > > Note that bytes() may not have the serious problem that str() does of > being too catholic about its argument: nothing in __builtins__ has a > __bytes__! Of course there are a few things that do work: ints, and > sequences of ints. Good point. But this only applies to when the user _explicitly_ deals with bytes. But when the user just deals with the type (str or bytes) that is passed in, as os.path.* as well as DirEntry now do, this does not work. -Koos ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Tue, Apr 12, 2016 at 9:32 AM, Koos Zevenhovenwrote: > > 1) A "proper" path object -- i.e. pathlib.Path or anything else that > > supports the path protocol. > > > > 2) the bytes that the OS actually needs. > > > > You do have a point there. But since bytes pathnames are deprecated on > windows, Ah -- there's the fatal flaw -- even Windows needs bytes at the lowest level, but the decision was already made there to use str as the the lingua-franca -- i.e. the user NEVER sees a path as a bytestring on Windows? I guess that's decided then. str is the exchange format. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 04/12/2016 09:26 AM, Koos Zevenhoven wrote: So I'm, once again, posing this question (that I don't think got any reactions previously): Is there a significant audience for this new function, or is it enough to keep it a private function for the stdlib to use? Quite frankly, I expect the stdlib itself to be the primary consumer. But I see no reason to not publish the function so that users who need the advanced functionality have easy access to it. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Tue, Apr 12, 2016 at 7:19 PM, Chris Barkerwrote: > > One more though came up just now: there are different level sof abstractions > and representations for paths. We don't want to make Path a subclass of > string, because Path is supposed to be a higher level abstraction -- good. > > then at the bottom of the stack, we NEED the bytes level path, because that > what ultimately gets passed to the OS. > > THe legacy from the single-byte encoding days is that bytes and strings were > the same, so we could let people work with nice human readable strings, > while also working with byte paths in the same way -- but those days are > gone -- py3 make s clear (and important) distiction between nice human > readable strings and the bytes that represent them. > > So: why use strings as the lingua franca of paths? i.e. the basis of the > path protocol. maybe we should support only two path representations: > > 1) A "proper" path object -- i.e. pathlib.Path or anything else that > supports the path protocol. > > 2) the bytes that the OS actually needs. > You do have a point there. But since bytes pathnames are deprecated on windows, this seems to lead to supporting both str and bytes in the protocol, or having two protocols __fspathbytes__ and __fspathstr__ (and one being preferred over the other, potentially even depending on the platform)., -Koos ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Tue, Apr 12, 2016 at 11:56 AM, Nick Coghlanwrote: > One possible way to address this concern would be to have the > underlying protocol be bytes/str (since boundary code frequently needs > to handle the paths-are-bytes assumption in POSIX), but offer an > "os.fspathname" API that rejected bytes output from os.fspath. That > is, it would be equivalent to: > > def fspathname(path): > name = os.fspath(path) > if not isinstance(name, str): > raise TypeError("Expected str for pathname, not > {}".format(type(name))) > return name > > That way folks that wanted the clean "must be str" signature could use > os.fspathname, while those that wanted to accept either could use the > lower level os.fspath. I'm not necessarily opposed to this. I kept bringing up bytes in the discussion because os.path.* etc. and DirEntry support bytes and will need to keep doing so for backwards compatibility. I have no intention to use bytes pathnames myself. But it may break existing code if functions, for instance, began to decode bytes paths to str if they did not previously do so (or to reject them). It is indeed a lot safer to make new code not support bytes paths than to change the behavior of old code. But then again, do we really recommend new code to use os.fspath (or os.fspathname)? Should they not be using either pathlib or os.path.* etc. so they don't have to care? I'm sure Ethan and his library (or some other path library) will manage without the function in the stdlib, as long as the dunder attribute is there. So I'm, once again, posing this question (that I don't think got any reactions previously): Is there a significant audience for this new function, or is it enough to keep it a private function for the stdlib to use? That handful of third-party path libraries can decide for themselves if they want to (a) reject bytes or (b) implicitly fsdecode them or (c) pass them through just like str, depending on whatever their case requires in terms of backwards compatiblity or other goals. If we forget about the os.fswhatever function, we only have to decide whether the magic dunder attribute can be str or bytes or just str. -Koos ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Mon, Apr 11, 2016 at 10:40 PM, Greg Ewingwrote: > > So the ONLY thing >> you should do with it is pass it along to another low level system >> call. >> > > Not quite -- you can separate it into components and > work with them. Essentially the same set of operations > that os.path provides. > ahh yes, so while posix claims that paths are "just a char*", they are really bytes where we can assume that the byte with value 2F is the pathsep (and that 2E separates an extension?), so I suppose os.path is useful. But I still think that most of us should never deal with bytes paths, and the few that need to should just work with the low level functions and be done with it. One more though came up just now: there are different level sof abstractions and representations for paths. We don't want to make Path a subclass of string, because Path is supposed to be a higher level abstraction -- good. then at the bottom of the stack, we NEED the bytes level path, because that what ultimately gets passed to the OS. THe legacy from the single-byte encoding days is that bytes and strings were the same, so we could let people work with nice human readable strings, while also working with byte paths in the same way -- but those days are gone -- py3 make s clear (and important) distiction between nice human readable strings and the bytes that represent them. So: why use strings as the lingua franca of paths? i.e. the basis of the path protocol. maybe we should support only two path representations: 1) A "proper" path object -- i.e. pathlib.Path or anything else that supports the path protocol. 2) the bytes that the OS actually needs. this would mean that the protocol would be to have a __pathbytes__() method that woulde return the bytes that should be passed off to the OS. A posix Path implementation could store that internal bytes representation, so it could pass it off unchanged if that's all you need to do. Any current API that takes bytes could be made to easily work. I'm SURE I'm missing something really big here, but it seems like maybe it's better to get farther from "strings as paths" rather than closer to it -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 04/11/2016 02:58 PM, Ethan Furman wrote: Sticking points: --- Do we allow bytes to be returned from os.fspath()? If yes, then do we allow bytes from __fspath__()? On 04/11/2016 10:28 PM, Stephen J. Turnbull wrote: > In text applications, "bytes as carcinogen" is an apt metaphor. On 04/12/2016 08:25 AM, Chris Angelico wrote: > I would say No and No, on the basis that it's *far* easier to widen > their scope in 3.7 than to narrow it. On 04/11/2016 08:45 PM, Nick Coghlan wrote: > I've come around to the point of view that allowing both str and > bytes-like objects to pass through unchanged makes sense, with the > rationale being the one someone mentioned regarding ease-of-use in > os.path. [...] One possible way to address this concern would be to have the underlying protocol be bytes/str (since boundary code frequently needs to handle the paths-are-bytes assumption in POSIX), but offer an "os.fspathname" API that rejected bytes output from os.fspath. I think this is the way forward: offer a standard way to get paths-as-strings, with an easily supported way of working with paths-as-bytes. This could be with on os.fspathname() & os.fspath() pair of functions, or with a single function that has a parameter specifying what to do with bytes objects: reject (default), accept, or (maybe) an encoding to use to coerce to bytes. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
Nick Coghlan writes: > One possible way to address this concern would be to have the > underlying protocol be bytes/str (since boundary code frequently > needs to handle the paths-are-bytes assumption in POSIX), What "needs"? As has been pointed out several times, with PEP 383 you can deal with bytes losslessly by using an arbitrary codec and errors=surrogateescape. I know why *I* use bytes nevertheless: because when I must guess the encoding, it just makes more sense to read bytes and then iterate over codecs until the result looks like words I know in some language. I don't understand why people who mostly believe "bytes are text, too" because almost all they ever see are bytes in the range 0x00-0x7f need bytes. For them, fsdecode and fsencode DTRT. If you want to claim "efficiency", I can't gainsay since I don't know the applications, but if you're trying to manipulate file names millions of times per second, I have to wonder what you're doing with them that benefits so much from Path. > but offer an "os.fspathname" API that rejected bytes output from > os.fspath. Either it's a YAGNI because I'm not going to get any bytes in the first place, or it raises where I probably could have done something useful with bytes if I were expecting them (see "pathological" below). > That way folks that wanted the clean "must be str" signature Er, I don't need no steenkin' "clean signature". I need str, and if I can't get it from __fspath__, there's always os.fsdecode. But this is serious horse-before cart-putting, punishing those who do things Python-3-ishly right. > The ambiguity in question here is inherent in the differences between > the way POSIX and Windows work, Not with PEP 383, it's not. And I don't do Windows, so my preference for str has nothing to do with it mapping to native OS APIs well. The ambiguity in question here is inherent in the differences between the ways Python 2 and Python 3 programmers work on POSIX AFAICS. Certainly, there will be times when fsdecode doesn't DTRT. So those times you have to use an explicit bytes.decode. Note that when you *do* care enough to do that, it's because the Path is *text* -- you're going to display it to a human, or pass it out of the module. If all you're going to do is access the filesystem object denoted, fsdecode does a sufficiently accurate job. So if for some reason you're getting bytes at the boundary, I see no reason why you can't have a convenience constructor def pathological(str_or_bytes_or_path_seq): args = [] for s_o_b in str_or_bytes_or_path_seq: args.append(os.fsdecode(s_o_b) if isinstance(s_o_b, bytes) else s_o_b) return pathlib.Path(str_or_path_list) for when that's good enough (maybe Antoine would even allow it into pathlib?) > so there are limits to how far we can go in hiding it without > making things worse rather than better. What "hide"? Nobody is suggesting that the polymorphic os APIs should go away. Indeed, they are perfect TOOWTDI, giving the programmer exactly the flexibility needed *and no more*, *at* the boundary. The questions on my mind are: (A) Why does anybody need bytes out of a pathlib.Path (or other __fspath__-toting, higher-level API) *inside* the boundary? Note that the APIs in os (etc) *don't need* bytes because they are already polymorphic. (B) If they do, why can't they just apply bytes() to the object? I understand that that would offend Ethan's aesthetic sense, so it's worth looking for a nice way around it. But allowing __fspath__ to return bytes or str is hideous, because Paths are clearly on the application side of the boundary. Note that bytes() may not have the serious problem that str() does of being too catholic about its argument: nothing in __builtins__ has a __bytes__! Of course there are a few things that do work: ints, and sequences of ints. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
Sorry for disturbing this thread's harmony. On 12.04.2016 08:00, Ethan Furman wrote: On 04/11/2016 10:14 PM, Chris Barker - NOAA Federal wrote: Consider os.path.join: Why in the world do the os.path functions need to work with Path objects? ( and other conforming objects) Because library XYZ that takes a path and wants to open it shouldn't have to care whether that path is a string or pathlib.Path -- but if os.open can't use pathlib.Path then the library has to care (or the user has to care). This all started with the goal of using Path objects in the stdlib, but that's for opening files, etc. Etc. as in os.join? os.stat? os.path.split? Path is an alternative to os.path -- you don't need to use both. I agree with that quote of Chris. As a user you don't, no. As a library that has no control over what kind of "path" is passed to you -- well, if os and os.path can accept Path objects then you can just use os and os.path; otherwise you have to use os and os.path if passed a str or bytes, and pathlib.Path if passed a pathlib.Path -- so you do have to use both. I don't agree here. There's no need to increase the convenience for a library maintainer when it comes to implicit conversions. When people want to use your library and it requires a string, the can simply use "my_path.path" and everything still works for them when they switch to pathlib. Best, Sven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On Tue, Apr 12, 2016 at 7:58 AM, Ethan Furmanwrote: > Sticking points: > --- > > Do we allow bytes to be returned from os.fspath()? If yes, then do we allow > bytes from __fspath__()? > I would say No and No, on the basis that it's *far* easier to widen their scope in 3.7 than to narrow it. Once you declare that one or both of these may return bytes, it becomes an annoying incompatibility to change that (even if it *is* marked provisional), which almost certainly means it won't happen. By restricting them both, we force the issue: if you want bytes, you'll know about it. I'd also prefer to stick to Unicode path names, for reasons I've stated in other threads. Undecodable path byte streams can be handled already, so what are we really gaining by allowing a Path-like object to emit bytes? If it becomes a major issue for a lot of types, it wouldn't be hard to add a helper function somewhere (or a mixin class that provides a ready-to-go __fspath__, which might well be sufficient). ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 12 April 2016 at 15:28, Stephen J. Turnbullwrote: > Donald Stufft writes: > > > I think yes and yes [__fspath__ and fspath should be allowed to > > handle bytes, otherwise] it seems like making it needlessly harder > > to deal with a bytes path > > It's not needless. This kind of polymorphism makes it hard to review > code locally. Once bytes get a foothold inside a text application, > they metastasize altogether too easily, and you end up with TypeErrors > or UnicodeErrors quite far from the origin. Debugging often requires > tracing data flows over hill and over dale while choking from the > dusty trail, or band-aids like a top-level "except UnicodeError: > log_and_quarantine(bytes)". I can't prove that returning bytes from > these APIs is a big risk in this sense, but I can't see a way to prove > that it's not, either, given that their point is duck-typing, and > therefore they may be generalized in the future, and by third parties. > > I understand that there are applications where it's bytes all the way > down, but by the very nature of computing systems, there are systems > where bytes are decoded to text. For historical reasons (the encoding > Tower of Babel), it's very error-prone to do that on demand. Best > practice is to do the conversion as close to the boundary as possible, > and process only text internally. One possible way to address this concern would be to have the underlying protocol be bytes/str (since boundary code frequently needs to handle the paths-are-bytes assumption in POSIX), but offer an "os.fspathname" API that rejected bytes output from os.fspath. That is, it would be equivalent to: def fspathname(path): name = os.fspath(path) if not isinstance(name, str): raise TypeError("Expected str for pathname, not {}".format(type(name))) return name That way folks that wanted the clean "must be str" signature could use os.fspathname, while those that wanted to accept either could use the lower level os.fspath. The ambiguity in question here is inherent in the differences between the way POSIX and Windows work, so there are limits to how far we can go in hiding it without making things worse rather than better. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 12 April 2016 at 06:28, Stephen J. Turnbullwrote: > Donald Stufft writes: > > > I think yes and yes [__fspath__ and fspath should be allowed to > > handle bytes, otherwise] it seems like making it needlessly harder > > to deal with a bytes path > > It's not needless. This kind of polymorphism makes it hard to review > code locally. Once bytes get a foothold inside a text application, > they metastasize altogether too easily, and you end up with TypeErrors > or UnicodeErrors quite far from the origin. Debugging often requires > tracing data flows over hill and over dale while choking from the > dusty trail, or band-aids like a top-level "except UnicodeError: > log_and_quarantine(bytes)". I can't prove that returning bytes from > these APIs is a big risk in this sense, but I can't see a way to prove > that it's not, either, given that their point is duck-typing, and > therefore they may be generalized in the future, and by third parties. > > I understand that there are applications where it's bytes all the way > down, but by the very nature of computing systems, there are systems > where bytes are decoded to text. For historical reasons (the encoding > Tower of Babel), it's very error-prone to do that on demand. Best > practice is to do the conversion as close to the boundary as possible, > and process only text internally. > > In text applications, "bytes as carcinogen" is an apt metaphor. > > Now, I'm not Dutch, so I can't tell you it's obvious that the risk to > text-processing applications is more important than the inconvenience > to byte-shoveling applications. But there is a need to be > parsimonious with polymorphism. As someone who has done a lot of work helping projects to port from the 2.x bytes/text model to the 3.x model, I have similar concerns that rooting out the source of bytes objects appearing in a program could be an issue with the proposed "return either" approach. The most effective tool I have found in fixing programs with text/bytes issues is carefully and thoroughly annotating precisely which functions accept and return bytes, and which accept and return text. The sort of mixed-mode processing we're talking about here makes that substantially harder. And note that the signature of os.fspath can return bytes or text *independent* of the type of the argument - it's not a "bytes in, bytes out" function like the usual pattern of "polymorphic support for bytes". But just like Stephen, I have no feel for how significant the risk will be in real life. I've never worked on code that actually has a need for bytestring paths (particularly now that surrogateescape ensures that most cases "just work"). Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 04/11/2016 10:14 PM, Chris Barker - NOAA Federal wrote: Consider os.path.join: Why in the world do the os.path functions need to work with Path objects? ( and other conforming objects) Because library XYZ that takes a path and wants to open it shouldn't have to care whether that path is a string or pathlib.Path -- but if os.open can't use pathlib.Path then the library has to care (or the user has to care). This all started with the goal of using Path objects in the stdlib, but that's for opening files, etc. Etc. as in os.join? os.stat? os.path.split? Path is an alternative to os.path -- you don't need to use both. As a user you don't, no. As a library that has no control over what kind of "path" is passed to you -- well, if os and os.path can accept Path objects then you can just use os and os.path; otherwise you have to use os and os.path if passed a str or bytes, and pathlib.Path if passed a pathlib.Path -- so you do have to use both. - the names would be fspath and __fspath__, since the result may be either a path name as text, or an encoded path name as bytes You just used the phrase "path name as bytes" -- so why is __pathname__ inappropriate if it might return bytes? No, he used the phrase "*encoded* path name as bytes". Names are typically represented as text, and since bytes might be returned we don't want a signal that says text. I like __pathname__ better because this entire effort is because we' be decided itMs important to make the distinction between a "path" and the text representation of said path. No, this entire effort is to make pathlib work with the rest of the stdlib. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
Chris Barker - NOAA Federal wrote: Why in the world do the os.path functions need to work with Path objects? So that applications using path objects can pass them to library code that uses os.path to manipulate them. I'm confused about what a bytes path IS -- is it encoded? It's a sequence of bytes identifying a file. Often it will be an encoding of som piece of text in the file system encoding, but there's no guarantee of that. Can you assume it can be decoded ? Only if you use an encoding in which all byte sequences are valid, such as latin1 or utf8+surrogateescape. So the ONLY thing you should do with it is pass it along to another low level system call. Not quite -- you can separate it into components and work with them. Essentially the same set of operations that os.path provides. - the names would be fspath and __fspath__, since the result may be either a path name as text, or an encoded path name as bytes I like __pathname__ better because this entire effort is because we' be decided itMs important to make the distinction between a "path" and the text representation of said path. I agree -- the term "pathname" can cover both text and bytes. When posix talks about pathnames it's really talking about bytes. -- Greg ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
Donald Stufft writes: > I think yes and yes [__fspath__ and fspath should be allowed to > handle bytes, otherwise] it seems like making it needlessly harder > to deal with a bytes path It's not needless. This kind of polymorphism makes it hard to review code locally. Once bytes get a foothold inside a text application, they metastasize altogether too easily, and you end up with TypeErrors or UnicodeErrors quite far from the origin. Debugging often requires tracing data flows over hill and over dale while choking from the dusty trail, or band-aids like a top-level "except UnicodeError: log_and_quarantine(bytes)". I can't prove that returning bytes from these APIs is a big risk in this sense, but I can't see a way to prove that it's not, either, given that their point is duck-typing, and therefore they may be generalized in the future, and by third parties. I understand that there are applications where it's bytes all the way down, but by the very nature of computing systems, there are systems where bytes are decoded to text. For historical reasons (the encoding Tower of Babel), it's very error-prone to do that on demand. Best practice is to do the conversion as close to the boundary as possible, and process only text internally. In text applications, "bytes as carcinogen" is an apt metaphor. Now, I'm not Dutch, so I can't tell you it's obvious that the risk to text-processing applications is more important than the inconvenience to byte-shoveling applications. But there is a need to be parsimonious with polymorphism. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
> with the > rationale being the one someone mentioned regarding ease-of-use in > os.path. > > Consider os.path.join: Why in the world do the os.path functions need to work with Path objects? ( and other conforming objects) Thus all started with the goal of using Path objects in the stdlib, but that's for opening files, etc. Path is an alternative to os.path -- you don't need to use both. And if you do have a byte path, you can stick with os.path BTW, I'm confused about what a bytes path IS -- is it encoded? Can you assume it can be decoded ? It seems to me that the ONLY time you should get a byte path is from a low level system call on a posix system, and you may have no idea how it's encoded. So the ONLY thing you should do with it is pass it along to another low level system call. I can't see why we should support anything else with bytes objects. > - the names would be fspath and __fspath__, since the result may be > either a path name as text, or an encoded path name as bytes You just used the phrase "path name as bytes" -- so why is __pathname__ inappropriate if it might return bytes? I like __pathname__ better because this entire effort is because we' be decided itMs important to make the distinction between a "path" and the text representation of said path. Just sayin' -CHB ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 12 April 2016 at 13:45, Nick Coghlanwrote: > Consider os.path.join: with a permissive os.fspath, the necessary > update should just be to introduce "map(os.fspath, args)" (or its C > equivalent), and then continue with the existing bytes vs str handling > logic. That does remind me: once a patch is available, we should check the benchmark numbers with the patch applied. I'd expect the new protocol overhead to be swamped by the actual IO costs, but this kind of low level change can have surprising consequences. Regarding the type checks, PyObject_AsFilesystemPath (or whatever we call it) will be implemented in C, with os.fspath just calling that, so doing "PyUnicode_Check(path) || PyBytes_Check(path)" on the result will be both cheap and convenient for API consumers (since it means they know they only have to cope with bytes or str instances internally, and will get a clear error message if handed something else). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
On 12 April 2016 at 07:58, Ethan Furmanwrote: > Sticking points: > --- > > Do we allow bytes to be returned from os.fspath()? If yes, then do we allow > bytes from __fspath__()? I've come around to the point of view that allowing both str and bytes-like objects to pass through unchanged makes sense, with the rationale being the one someone mentioned regarding ease-of-use in os.path. Consider os.path.join: with a permissive os.fspath, the necessary update should just be to introduce "map(os.fspath, args)" (or its C equivalent), and then continue with the existing bytes vs str handling logic. Functions consuming os.fspath can then decide on a case-by-case basis how they want to handle binary paths: either use them as is (which will usually work on mostly-ASCII systems), convert them to text with os.fsdecode (which will usually work on *nix systems), or disallow them entirely (which would probably only be appropriate for libraries that wanted to ensure support for non-ASCII paths on Windows systems). That then cascades into the other open questions mentioned: - permitted return types for both fspath and __fspath__ would be (str, bytes) - the names would be fspath and __fspath__, since the result may be either a path name as text, or an encoded path name as bytes Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib - current status of discussions
> On Apr 11, 2016, at 5:58 PM, Ethan Furmanwrote: > > name: > > > We are down to two choices: > > - __fspath__, or > - __fspathname__ > > The final choice I suspect will be affected by the choice to allow (or not) > bytes. +1 on __fspath__, -0 on __fspathname__ > > > > add a Path ABC: > -- > > undecided I think it makes sense to add it, but maybe only in 3.6? Path accepting code could be updated to do something like `isinstance(obj, (bytes, str, PathMeta))` which seems like a net win to me. > > > Sticking points: > --- > > Do we allow bytes to be returned from os.fspath()? If yes, then do we allow > bytes from __fspath__()? I think yes and yes, it seems like making it needlessly harder to deal with a bytes path in the scenarios that you’re actually dealing with them is the kind of change that 3.0 made that ended up getting rolled back where it could. - Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA signature.asc Description: Message signed with OpenPGP using GPGMail ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] pathlib - current status of discussions
name: We are down to two choices: - __fspath__, or - __fspathname__ The final choice I suspect will be affected by the choice to allow (or not) bytes. method or attribute: --- method built-in: Almost - we'll put it in the os module add to str: -- No, not all strings are paths. add to C API: Yes. Possible names include PyUnicode_FromFSPath and PyObject_Path -- again, the choice of bytes inclusion will affect the final choice of name. add a Path ABC: -- undecided Sticking points: --- Do we allow bytes to be returned from os.fspath()? If yes, then do we allow bytes from __fspath__()? -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com