Re: [Python-3000] str/unicode tests: pyexpat.c and read(n)

2007-07-22 Thread Talin
Greg Ewing wrote: > Guido van Rossum wrote: >> Now I'm confused. Are we proposing that all our XML APIs read and >> write encoded bytes, or are we proposing that they read and write >> Unicode strings, leaving the encoding/decoding to the I/O stream? > > The design of XML seems a bit braindamaged

Re: [Python-3000] str/unicode tests: pyexpat.c and read(n)

2007-07-22 Thread Greg Ewing
Guido van Rossum wrote: > Now I'm confused. Are we proposing that all our XML APIs read and > write encoded bytes, or are we proposing that they read and write > Unicode strings, leaving the encoding/decoding to the I/O stream? The design of XML seems a bit braindamaged here, with the encoding spe

Re: [Python-3000] str/unicode tests: pyexpat.c and read(n)

2007-07-22 Thread Martin v. Löwis
Guido van Rossum schrieb: > On 7/22/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: >> > Sure, normally XML is serialized to bytes, but it is also >> > serializable to unicode, and that's a useful feature to have (if >> > implementable). >> >> It's not reasonably implementable; users who have use

Re: [Python-3000] str/unicode tests: pyexpat.c and read(n)

2007-07-22 Thread Fred L. Drake, Jr.
On Sunday 22 July 2007, Guido van Rossum wrote: > Now I'm confused. Are we proposing that all our XML APIs read and > write encoded bytes, or are we proposing that they read and write > Unicode strings, leaving the encoding/decoding to the I/O stream? I > thought the latter was preferred but no

Re: [Python-3000] str/unicode tests: pyexpat.c and read(n)

2007-07-22 Thread Guido van Rossum
On 7/22/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > Sure, normally XML is serialized to bytes, but it is also > > serializable to unicode, and that's a useful feature to have (if > > implementable). > > It's not reasonably implementable; users who have use cases > will have to encode as UT

Re: [Python-3000] str/unicode tests: pyexpat.c and read(n)

2007-07-22 Thread Martin v. Löwis
> Sure, normally XML is serialized to bytes, but it is also > serializable to unicode, and that's a useful feature to have (if > implementable). It's not reasonably implementable; users who have use cases will have to encode as UTF-8 first. Regards, Martin

Re: [Python-3000] str/unicode tests: pyexpat.c and read(n)

2007-07-21 Thread Talin
James Y Knight wrote: > On Jul 21, 2007, at 12:25 AM, Fred L. Drake, Jr. wrote: > >> On Saturday 21 July 2007, Joe Gregorio wrote: >>> Should xml.parsers.expat.XMLParser.ParseFile(file) operate on >>> both text and binary streams? >> No. XML is a serialization of a markup language containing Unic

Re: [Python-3000] str/unicode tests: pyexpat.c and read(n)

2007-07-21 Thread Fred L. Drake, Jr.
On Saturday 21 July 2007, James Y Knight wrote: > Well...there's many reasons why it is useful to be able to parse an > already-decoded unicode stream into XML, and to serialize XML into a > unicode string. For example, if combining into a larger unicode > document, or parsing from a literal st

Re: [Python-3000] str/unicode tests: pyexpat.c and read(n)

2007-07-21 Thread James Y Knight
On Jul 21, 2007, at 12:25 AM, Fred L. Drake, Jr. wrote: > On Saturday 21 July 2007, Joe Gregorio wrote: >> Should xml.parsers.expat.XMLParser.ParseFile(file) operate on >> both text and binary streams? > > No. XML is a serialization of a markup language containing Unicode > character > into an

Re: [Python-3000] str/unicode tests: pyexpat.c and read(n)

2007-07-20 Thread Fred L. Drake, Jr.
On Saturday 21 July 2007, Joe Gregorio wrote: > Should xml.parsers.expat.XMLParser.ParseFile(file) operate on > both text and binary streams? No. XML is a serialization of a markup language containing Unicode character into an encoded stream. -Fred -- Fred L. Drake, Jr. __

[Python-3000] str/unicode tests: pyexpat.c and read(n)

2007-07-20 Thread Joe Gregorio
Should xml.parsers.expat.XMLParser.ParseFile(file) operate on both text and binary streams? If it should operate on text streams then an issue arises from "read(n)" meaning different things for text and binary streams. If the stream passed in is "text" then read(n) will read 'n' unicode characters