Chris Angelico writes: > AFAICT, the compatibility layer would simply decode the bytes using > surrogateescape handling, which should round-trip anything.
By design. See PEP 383. Or rather, the OP should; he has not done his homework and is confused by his own FUD. This whole subthread is really python-list territory. Since a lot of people I respect seem uncertain about the facts, for the record, let's lay out the (putative) issues remaining for post-PEP-383 Python vs. str-y path objects. (0) "Can't work with some POSIX (bytes) paths" is closed by PEP 383, forget it. os.fsdecode(bytespath) as soon as you get one, os.fsencode(strpath) just before you need one, done. Surrogates embedded in strpath may need special handling depending on the application (see (1)). (1) str.encode(errors='strict') (the default) will blow up on embedded surrogates. Yes, but that's a *good* thing if you're mixing str derived from filesystem paths with other text. There's no way to avoid it. If you're just passing it back to open(), it Just Works, done. (2) You're using bytes as text a la 2.x for "efficiency's" sake, and you're worried that you might pass a str-y Path deep into bytes territory and it will explode there. I don't think there is any sympathy left for that use case on Python dev channels. Define a clear boundary with well-defined entry and exit gates, and convert there. Then you can get some sleep. (How-to example: your "compatibility layer".) (3) You're worried about inefficiency of decoding/encoding to the same or trivially changed bytes (ie, you didn't need pathlib in the first place, but you got it anyway) -- this especially matters for 2.7, but is significant for 3.x too, if you're using a bunch of paths in a tight loop. I don't have sympathy for that use case, but Brett and Guido do, and Brett's PEP handles it by making __fspath__ polymorphic in the usual os.path-y way, with Guido's modification. This is always a tradeoff. If you know your JPEGs all have extension '.JPG' and png_path = jpeg_path[:-4] + b'.png' is readable enough for you, use that, not pathlib or Antipathy, and you get your efficiency. (Doing jpeg_path.rindex(b'.') is left as an exercise for the reader. Part (i): Is it really worth it?) If you want the readability of a rich path library and the efficiency of bytes, you *may* have the option of using Ethan's Antipathy (or whatever). If you can't use Antipathy, use bytes methods directly, or accept that it isn't *that* inefficient and use pathlib. At this point, I think this subcase is just FUD, no real examples were presented where the efficiency hit of encoding/decoding gets in the way of getting work done using pathlib. If you need to stick to stdlib for some reason (eg, to use a higher-level library that uses pathlib), live with the "compatibility layer"'s inefficiency. Decoding and encoding are actually rather low-cost operations at path lengths (PATHMAX=256 was common, not so long ago!). Most high-level libraries will impose a lot more overhead elsewhere, and calling into pathlib by itself will add a certain amount of overhead as well. (4) Lack of transparency/readability for "simple" operations. If Antipathy is something you can use, I agree it's plausible that avoiding a few os.fsdecode and os.fsencode calls would look nicer, but this is really a style question. My take: I think of paths as human-readable, so presenting them as str (not bytes) is important to me, important enough that I advocate that position to other developers. If you do the conversion at the boundary between a bytes-y module and pathlib ("compatibility layer") I don't see how it affects readability of the path manipulation code, while data marshaling at boundaries is a expected fact of software development. YMMV. (0) is thus a non-issue. (1) is not something that can be addressed by general principles, let alone language design. (2)-(4) are all real issues regardless of how I feel they should be resolved :-), but they're all design trade-offs, not things that can completely block you from getting some kinds of work done in your own style (eg, the situation str-minded people were in before PEP 383). Python 3 is an example of how language design can help alleviate issues like (2), by discouraging that use case in various ways. Brett's PEP is an example of how language design can help alleviate issues like (3) and (4). In particular, it helps us to interface pathlib to open() and friends in a very natural, readable way, without explicit conversions that should be unnecessary by the nature of the operation and its arguments. By contrast, the conversion of bytes to str is important to do explicitly because they are different representations of the same thing, and it's important that readers be notified of that change of representation. > Or am I wrong here somewhere? Well, considering the length of this irrelevant-to-the-PEP subthread, arguably you are feeding a successful troll. I hope that having posted the above, in the future there will be *one*, *short* reply to such questions: Not a problem. Read PEP 383. and the thread will end there. Steve _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com