On Thu, Apr 9, 2009 at 1:15 AM, Antoine Pitrou <solip...@pitrou.net> wrote:
> As for reading/writing bytes over the wire, JSON is often used in the same
> context as HTML: you are supposed to know the charset and decode/encode the
> payload using that charset. However, the RFC specifies a default encoding of
> utf-8. (*)
>
>
> (*) http://www.ietf.org/rfc/rfc4627.txt
>

That is one short and sweet RFC. :-)

> The RFC also specifies a discrimination algorithm for non-supersets of ASCII
> (“Since the first two characters of a JSON text will always be ASCII
>   characters [RFC0020], it is possible to determine whether an octet
>   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
>   at the pattern of nulls in the first four octets.”), but it is not
> implemented in the json module:
>

Given the RFC specifies that the encoding used should be one of the
encodings defined by Unicode, wouldn't be a better idea to remove the
"unicode" support, instead? To me, it would make sense to use the
detection algorithms for Unicode to sniff the encoding of the JSON
stream and then use the detected encoding to decode the strings embed
in the JSON stream.

Cheers,
-- Alexandre
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to