2009/4/10 Nick Coghlan <ncogh...@gmail.com>: > gl...@divmod.com wrote: >> On 03:21 am, ncogh...@gmail.com wrote: >>> Given that json is a wire protocol, that sounds like the right approach >>> for json as well. Once bytes-everywhere works, then a text API can be >>> built on top of it, but it is difficult to build a bytes API on top of a >>> text one. >> >> I wish I could agree, but JSON isn't really a wire protocol. According >> to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the >> serialization of structured data". There are some notes about encoding, >> but it is very clearly described in terms of unicode code points. > > Ah, my apologies - if the RFC defines things such that the native format > is Unicode, then yes, the appropriate Python 3.x data type for the base > implementation would indeed be strings.
Indeed, the RFC seems to clearly imply that loads should take a Unicode string, dumps should produce one, and load/dump should work in terms of text files (not byte files). On the other hand, further down in the document: """ 3. Encoding JSON text SHALL be encoded in Unicode. The default encoding is UTF-8. Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets. """ This is at best confused (in my utterly non-expert opinion :-)) as Unicode isn't an encoding... I would guess that what the RFC is trying to say is that JSON is text (Unicode) and where a byte stream purporting to be JSON is encountered without a defined encoding, this is how to guess one. That implies that loads can/should also allow bytes as input, applying the given algorithm to guess an encoding. And similarly load can/should accept a byte stream, on the same basis. (There's no need to allow the possibility of accepting bytes plus an encoding - in that case the user should decode the bytes before passing Unicode to the JSON module). An alternative might be for the JSON module to register a special encoding ('JSON-guess'?) which captures the rules here. Then there's no need for special bytes parameter handling. Of course, this is all from a native English speaker, who therefore has no idea of the real life issues involved in Unicode :-) Paul. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com