On Thu, Apr 9, 2009 at 07:15, Antoine Pitrou <solip...@pitrou.net> wrote: > The RFC also specifies a discrimination algorithm for non-supersets of ASCII > (“Since the first two characters of a JSON text will always be ASCII > characters [RFC0020], it is possible to determine whether an octet > stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking > at the pattern of nulls in the first four octets.”), but it is not > implemented in the json module:
Well, your example is bad in the context of the RFC. The RFC states that JSON-text = object / array, meaning "loads" for '"hi"' isn't strictly valid. The discrimination algorithm obviously only works in the context of that grammar, where the first character of a document must be { or [ and the next character can only be {, [, f, n, t, ", -, a number, or insignificant whitespace (space, \t, \r, \n). >>>> json.loads('"hi"') > 'hi' >>>> json.loads(u'"hi"'.encode('utf16')) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/home/antoine/cpython/__svn__/Lib/json/__init__.py", line 310, in loads > return _default_decoder.decode(s) > File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 344, in decode > obj, end = self.raw_decode(s, idx=_w(s, 0).end()) > File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 362, in > raw_decode > raise ValueError("No JSON object could be decoded") > ValueError: No JSON object could be decoded Cheers, Dirkjan _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com