Nick Coghlan writes: > One possible way to address this concern would be to have the > underlying protocol be bytes/str (since boundary code frequently > needs to handle the paths-are-bytes assumption in POSIX),
What "needs"? As has been pointed out several times, with PEP 383 you can deal with bytes losslessly by using an arbitrary codec and errors=surrogateescape. I know why *I* use bytes nevertheless: because when I must guess the encoding, it just makes more sense to read bytes and then iterate over codecs until the result looks like words I know in some language. I don't understand why people who mostly believe "bytes are text, too" because almost all they ever see are bytes in the range 0x00-0x7f need bytes. For them, fsdecode and fsencode DTRT. If you want to claim "efficiency", I can't gainsay since I don't know the applications, but if you're trying to manipulate file names millions of times per second, I have to wonder what you're doing with them that benefits so much from Path. > but offer an "os.fspathname" API that rejected bytes output from > os.fspath. Either it's a YAGNI because I'm not going to get any bytes in the first place, or it raises where I probably could have done something useful with bytes if I were expecting them (see "pathological" below). > That way folks that wanted the clean "must be str" signature Er, I don't need no steenkin' "clean signature". I need str, and if I can't get it from __fspath__, there's always os.fsdecode. But this is serious horse-before cart-putting, punishing those who do things Python-3-ishly right. > The ambiguity in question here is inherent in the differences between > the way POSIX and Windows work, Not with PEP 383, it's not. And I don't do Windows, so my preference for str has nothing to do with it mapping to native OS APIs well. The ambiguity in question here is inherent in the differences between the ways Python 2 and Python 3 programmers work on POSIX AFAICS. Certainly, there will be times when fsdecode doesn't DTRT. So those times you have to use an explicit bytes.decode. Note that when you *do* care enough to do that, it's because the Path is *text* -- you're going to display it to a human, or pass it out of the module. If all you're going to do is access the filesystem object denoted, fsdecode does a sufficiently accurate job. So if for some reason you're getting bytes at the boundary, I see no reason why you can't have a convenience constructor def pathological(str_or_bytes_or_path_seq): args = [] for s_o_b in str_or_bytes_or_path_seq: args.append(os.fsdecode(s_o_b) if isinstance(s_o_b, bytes) else s_o_b) return pathlib.Path(str_or_path_list) for when that's good enough (maybe Antoine would even allow it into pathlib?) > so there are limits to how far we can go in hiding it without > making things worse rather than better. What "hide"? Nobody is suggesting that the polymorphic os APIs should go away. Indeed, they are perfect TOOWTDI, giving the programmer exactly the flexibility needed *and no more*, *at* the boundary. The questions on my mind are: (A) Why does anybody need bytes out of a pathlib.Path (or other __fspath__-toting, higher-level API) *inside* the boundary? Note that the APIs in os (etc) *don't need* bytes because they are already polymorphic. (B) If they do, why can't they just apply bytes() to the object? I understand that that would offend Ethan's aesthetic sense, so it's worth looking for a nice way around it. But allowing __fspath__ to return bytes or str is hideous, because Paths are clearly on the application side of the boundary. Note that bytes() may not have the serious problem that str() does of being too catholic about its argument: nothing in __builtins__ has a __bytes__! Of course there are a few things that do work: ints, and sequences of ints. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com