Re: [Python-3000] str/unicode tests: pyexpat.c and read(n)

Martin v. Löwis Sun, 22 Jul 2007 09:30:20 -0700

Guido van Rossum schrieb:
> On 7/22/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
>> > Sure, normally XML is serialized to bytes, but it is also
>> > serializable to unicode, and that's a useful feature to have (if
>> > implementable).
>>
>> It's not reasonably implementable; users who have use cases
>> will have to encode as UTF-8 first.
> 
> Now I'm confused. Are we proposing that all our XML APIs read and
> write encoded bytes, or are we proposing that they read and write
> Unicode strings, leaving the encoding/decoding to the I/O stream?


Unicode strings in both cases.

I was not talking about writing at all; pyexpat only does reading
(aka parsing). It returns Unicode strings, but processes bytes.

> I
> thought the latter was preferred but now it looks like you're arguing
> for the former?

The XML parser input stream should be byte-oriented. XML has its own
notion of input encoding (expressed in the XML declaration, <?xml...);
it's the job of the parser to figure it out. Having the user provide
a character-oriented stream to the parser is both inconvenient and
error-prone: the application would have to figure out the encoding
itself first.

Regards,
Martin
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] str/unicode tests: pyexpat.c and read(n)

Reply via email to