On 14 January 2014 16:04, Guido van Rossum <gu...@python.org> wrote: > On Mon, Jan 13, 2014 at 9:34 PM, Nick Coghlan <ncogh...@gmail.com> wrote: > I've now looked at asciistr. (Thanks Glenn and Ethan for the link.) > > Now that I (hopefully) understand it, I'm worried that a text > processing algorithm that uses asciistr might under hard-to-predict > circumstances (such as when the arguments contain nothing of interest > to the algorithm) might return an asciistr instance instead of a str > or bytes instance, and this might confuse a caller (e.g. isinstance() > checks might fail, dict lookups, or whatever -- it feels like the > problem is similar to creating the perfect proxy type).
Right, asciistr is designed for a specific kind of hybrid API where you want to accept binary input (and produce binary output) *and* you want to accept text input (and produce text output). Porting those from Python 2 to Python 3 is painful not because of any limitations of the str or bytes API but because it's the only use case I have found where I actually *missed* the implicit interoperability offered by the Python 2 str type. It's not an implementation style I would consider appropriate for the standard library - we need to code very defensively in order to aid debugging in arbitrary contexts, so I consider having an API like urllib.parse demand 7-bit ASCII in the binary version, and require text to handle impure input to be a better design choice. However, in an environment where you can place greater preconditions on your inputs (such as "ensure all input data is ASCII compatible") and you're willing to tolerate the occasional obscure traceback for particular kinds of errors, then it should be a convenient way to use common constants (like separators or URL scheme names) in an algorithm that can manipulate either binary or text, but not a combination of the two (the latter is still a nice improvement in correctness over Python 2, which allowed them to be mixed freely rather than requiring consistency across the inputs). It's still slightly different from Python 2, though. In Python 2, the interaction model was: str & str -> str str & unicode -> unicode (with the one exception being str.format: that consistently produces str rather than promoting to Unicode) My goal for asciistr is that it should exhibit the following behaviour: str & asciistr -> str asciistr & asciistr -> str (making it asciistr would be a pain and I don't have a use case for that) bytes & asciistr -> bytes So in code like that in urllib.parse (but in a more constrained context), you could just switch all your constants to asciistr, change your indexing operations to length 1 slices and then in theory essentially the same code that worked in Python 2 should also work in Python 3. However, Benno is finding that my warning about possible interoperability issues was accurate - we have various places where we do PyUnicode_Check() rather than PyUnicode_CheckExact(), which means we don't always notice a PEP 3118 buffer interface if it is provided by a str subclass. We'll look at those as we find them, and either work around them (if we can), decide not to support that behaviour in asciistr, or else I'll create a patch to resolve the interoperability issue. It's not necessarily a type I'd recommend using in production code, as there *will* always be a more explicit alternative that doesn't rely on a tricksy C extension type that only works in CPython. However, it's a type I think is worth having implemented and available on PyPI, even if it's just to disprove the claim that you *can't* write that kind of code in Python 3. >> PEP 460 should actually make asciistr easier in the long run, as I now >> expect we'll run into some "interesting" issues getting formatting to >> produce anything other than text (contrary to what I said elsewhere in >> these threads - I hadn't thought through the full implications at the >> time). > > For example? asciistr is a str subclass, so its formatting methods currently operate in the text domain and produce str output. Getting it to do otherwise is actually a task on the scale of implementing ASCII interpolation operations on the native bytes type. This realisation was the *other* factor that made me more comfortable with the idea of adding ASCII interpolation to the core bytes type - I previously thought asciistr could easily handle it, but it doesn't (except in the pure ASCII case where it could theoretically just encode at the end), thus also knocking out my "we can easily do this in an extension type, there's no need to provide it in the builtins" argument. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com