On 12 April 2016 at 06:28, Stephen J. Turnbull <step...@xemacs.org> wrote: > Donald Stufft writes: > > > I think yes and yes [__fspath__ and fspath should be allowed to > > handle bytes, otherwise] it seems like making it needlessly harder > > to deal with a bytes path > > It's not needless. This kind of polymorphism makes it hard to review > code locally. Once bytes get a foothold inside a text application, > they metastasize altogether too easily, and you end up with TypeErrors > or UnicodeErrors quite far from the origin. Debugging often requires > tracing data flows over hill and over dale while choking from the > dusty trail, or band-aids like a top-level "except UnicodeError: > log_and_quarantine(bytes)". I can't prove that returning bytes from > these APIs is a big risk in this sense, but I can't see a way to prove > that it's not, either, given that their point is duck-typing, and > therefore they may be generalized in the future, and by third parties. > > I understand that there are applications where it's bytes all the way > down, but by the very nature of computing systems, there are systems > where bytes are decoded to text. For historical reasons (the encoding > Tower of Babel), it's very error-prone to do that on demand. Best > practice is to do the conversion as close to the boundary as possible, > and process only text internally. > > In text applications, "bytes as carcinogen" is an apt metaphor. > > Now, I'm not Dutch, so I can't tell you it's obvious that the risk to > text-processing applications is more important than the inconvenience > to byte-shoveling applications. But there is a need to be > parsimonious with polymorphism.
As someone who has done a lot of work helping projects to port from the 2.x bytes/text model to the 3.x model, I have similar concerns that rooting out the source of bytes objects appearing in a program could be an issue with the proposed "return either" approach. The most effective tool I have found in fixing programs with text/bytes issues is carefully and thoroughly annotating precisely which functions accept and return bytes, and which accept and return text. The sort of mixed-mode processing we're talking about here makes that substantially harder. And note that the signature of os.fspath can return bytes or text *independent* of the type of the argument - it's not a "bytes in, bytes out" function like the usual pattern of "polymorphic support for bytes". But just like Stephen, I have no feel for how significant the risk will be in real life. I've never worked on code that actually has a need for bytestring paths (particularly now that surrogateescape ensures that most cases "just work"). Paul _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com