Re: elementtree and gbk encoding

2006-03-15 Thread Steven Bethard
Fredrik Lundh wrote: > Steven Bethard wrote: > >> Hmm... I downloaded the newest cElementTree (and I already had the >> newest ElementTree), and here's what I get: > >> >>> tree = myparser(filename, 'gbk') >> Traceback (most recent call last): >>File "", line 1, in ? >>File "", line 8,

Re: elementtree and gbk encoding

2006-03-15 Thread Fredrik Lundh
Steven Bethard wrote: > Hmm... I downloaded the newest cElementTree (and I already had the > newest ElementTree), and here's what I get: > >>> tree = myparser(filename, 'gbk') > Traceback (most recent call last): >File "", line 1, in ? >File "", line 8, in myparser > SyntaxError: not we

Re: elementtree and gbk encoding

2006-03-15 Thread Steven Bethard
Fredrik Lundh wrote: > Steven Bethard wrote: > >> I'm having trouble using elementtree with an XML file that has some >> gbk-encoded text. (I can't read Chinese, so I'm taking their word for >> it that it's gbk-encoded.) I always have trouble with encodings, so I'm >> sure I'm just screwing some

Re: elementtree and gbk encoding

2006-03-15 Thread Fredrik Lundh
Diez B. Roggisch wrote: > Interestingly enough, that has not to be the case. A document can very well > be well-formed without a header. The constraints for well-formedness are > scattered throughout the spec, so I'm not sure what they say about the used > encoding in absence of a header. if ther

Re: elementtree and gbk encoding

2006-03-15 Thread Diez B. Roggisch
> no, the parser must not to choke on a file for which the encoding has been > overridden. > > for example, the HTTP standard allows the transport layer to recode text/* > re- sources as long as it updates the charset properly, so if you e.g send > an XML document as text/xml and charset=iso-8859-

Re: elementtree and gbk encoding

2006-03-15 Thread Fredrik Lundh
Diez B. Roggisch wrote: >> good advice, but note that an envelope (e.g a HTTP request or response >> body) may override the encoding in the XML file itself. if this arrives >> in a MIME message with the proper charset information, it's perfectly okay >> to leave out the encoding from the file. >

Re: elementtree and gbk encoding

2006-03-15 Thread Diez B. Roggisch
> pyexpat has only limited support for non-standard encodings; the core > expat library only supports UTF-8, UTF-16, US-ASCII, and ISO-8859-1, > and the Python glue layer then adds support for all byte-to-byte en- > codings support by Python on top of that. Interesting. Maybe 4suite is more compl

Re: elementtree and gbk encoding

2006-03-15 Thread Diez B. Roggisch
Hi, > good advice, but note that an envelope (e.g a HTTP request or response > body) may override the encoding in the XML file itself. if this arrives > in a MIME message with the proper charset information, it's perfectly okay > to leave out the encoding from the file. It might be practical - s

Re: elementtree and gbk encoding

2006-03-14 Thread Fredrik Lundh
Diez B. Roggisch wrote: > 2) your xml is _not_ well-formed, as it doesn't contain a xml-header! > You need ask these guys to deliver the xml with header. Of course for > now it is ok to just prepend the text with something like version="1.0" encoding="gbk"?>. But I'd still request them to deliv

Re: elementtree and gbk encoding

2006-03-14 Thread Fredrik Lundh
Steven Bethard wrote: > I'm having trouble using elementtree with an XML file that has some > gbk-encoded text. (I can't read Chinese, so I'm taking their word for > it that it's gbk-encoded.) I always have trouble with encodings, so I'm > sure I'm just screwing something simple up. Can anyone

Re: elementtree and gbk encoding

2006-03-14 Thread Steven Bethard
Diez B. Roggisch wrote: >> Here's what I get with the prepending hack: >> >> >>> et.fromstring('\n' + >> open(filename).read()) >> Traceback (most recent call last): >> File "", line 1, in ? >> File "C:\Program >> Files\Python\lib\site-packages\elementtree\ElementTree.py", line 960, >> in X

Re: elementtree and gbk encoding

2006-03-14 Thread Diez B. Roggisch
> Here's what I get with the prepending hack: > > >>> et.fromstring('\n' + > open(filename).read()) > Traceback (most recent call last): > File "", line 1, in ? > File "C:\Program > Files\Python\lib\site-packages\elementtree\ElementTree.py", line 960, in > XML > parser.feed(text) > F

Re: elementtree and gbk encoding

2006-03-14 Thread Steven Bethard
Diez B. Roggisch wrote: > Steven Bethard schrieb: >> I'm having trouble using elementtree with an XML file that has some >> gbk-encoded text. (I can't read Chinese, so I'm taking their word for >> it that it's gbk-encoded.) I always have trouble with encodings, so >> I'm sure I'm just screwing

Re: elementtree and gbk encoding

2006-03-14 Thread Diez B. Roggisch
Steven Bethard schrieb: > I'm having trouble using elementtree with an XML file that has some > gbk-encoded text. (I can't read Chinese, so I'm taking their word for > it that it's gbk-encoded.) I always have trouble with encodings, so I'm > sure I'm just screwing something simple up. Can any