>> Clearly the unquote is str->bytes, <snip> You can't pass a Unicode string >> back >> as the result of unquote *without* passing in an encoding specifier, >> because the character set is application-specific. > So for unquote you're suggesting that it always return a bytes object > UNLESS an encoding is specified? As in: > >> urllib.parse.unquote('h%C3%BCllo') > b'h\xc3\xbcllo'
Yes, that's correct. That's what the RFC says we have to do. > I would object to that on two grounds. Firstly, I wouldn't expect or > desire a bytes object. The vast majority of uses for unquote will be > to get a character string out, not bytes. Secondly, there is a > mountain of code (including about 12 modules in the standard library) > which call unquote and don't give the user the encoding option, so > it's best if we pick a default that is what the majority of users will > expect. I argue that that's UTF-8. Unfortunately, despite your expectations or desires, the spec doesn't allow us that luxury. It's bytes out, and they may even be in a non-standard (not registered with IANA) encoding. There's no way to safely and correctly turn that sequence of bytes into a string. If other modules have been mis-using the interface, they are buggy and should be fixed. There's a lot of buggy stdlib code in Python around the older Web standards. I think it would be great to have another function, unquote_to_string, which took an extra "encoding" parameter, and returned a string. It would also be OK to add a keyword parameter to "unquote", I think, which provides an encoding, and causes unquote to return a string. But the standard behavior has to be to return bytes. > I'd prefer having a separate unquote_raw function which is > str->bytes, and the unquote function performs the same role as it > always have, which is str->str. Actually, it was originally bytes->bytes, because there was no notion of Unicode strings when it was added. It perhaps got misunderstood during the addition of Unicode support to Python; many people have had trouble wrapping their heads around all this, myself included. Bill _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com