Guido van Rossum schrieb: > On 7/22/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: >> > Sure, normally XML is serialized to bytes, but it is also >> > serializable to unicode, and that's a useful feature to have (if >> > implementable). >> >> It's not reasonably implementable; users who have use cases >> will have to encode as UTF-8 first. > > Now I'm confused. Are we proposing that all our XML APIs read and > write encoded bytes, or are we proposing that they read and write > Unicode strings, leaving the encoding/decoding to the I/O stream?
Unicode strings in both cases. I was not talking about writing at all; pyexpat only does reading (aka parsing). It returns Unicode strings, but processes bytes. > I > thought the latter was preferred but now it looks like you're arguing > for the former? The XML parser input stream should be byte-oriented. XML has its own notion of input encoding (expressed in the XML declaration, <?xml...); it's the job of the parser to figure it out. Having the user provide a character-oriented stream to the parser is both inconvenient and error-prone: the application would have to figure out the encoding itself first. Regards, Martin _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
