Am 24.08.14 03:11, schrieb Greg Ewing:
> Isaac Morland wrote:
>> In HTML 5 it allows non-ASCII-compatible encodings as long as U+FEFF
>> (byte order mark) is used:
>>
>> http://www.w3.org/TR/html-markup/syntax.html#encoding-declaration
>>
>> Not sure about XML.
> 
> According to Appendix F here:
> 
> http://www.w3.org/TR/xml/#sec-guessing
> 
> an XML parser needs to be prepared to try all the encodings it
> supports until it finds one that works well enough to decode
> the XML declaration, then it can find out the exact encoding
> used.

That's not what this section says. Instead, it says that
you need to auto-detect UCS-4, UTF-16, UTF-8 from the BOM,
or guess them or EBCDIC from the encoding of '<?'. This should
be enough to actually parse the encoding declaration. Other
non-ASCII-compatible encodings can only be used if declared
in an upper-level protocol (such as HTTP).

The parser is not expected to try out all encodings it supports.

Regards,
Martin

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to