Am 24.08.14 03:11, schrieb Greg Ewing: > Isaac Morland wrote: >> In HTML 5 it allows non-ASCII-compatible encodings as long as U+FEFF >> (byte order mark) is used: >> >> http://www.w3.org/TR/html-markup/syntax.html#encoding-declaration >> >> Not sure about XML. > > According to Appendix F here: > > http://www.w3.org/TR/xml/#sec-guessing > > an XML parser needs to be prepared to try all the encodings it > supports until it finds one that works well enough to decode > the XML declaration, then it can find out the exact encoding > used.
That's not what this section says. Instead, it says that you need to auto-detect UCS-4, UTF-16, UTF-8 from the BOM, or guess them or EBCDIC from the encoding of '<?'. This should be enough to actually parse the encoding declaration. Other non-ASCII-compatible encodings can only be used if declared in an upper-level protocol (such as HTTP). The parser is not expected to try out all encodings it supports. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com