On 14 Jan 2007, at 18:37 , Dieter Maurer wrote:
Philipp von Weitershausen wrote at 2007-1-14 14:59 +0100:
Traditionally, you parse an 8bit string, figure out its encoding (e.g. from <?xml encoding="utf-8"?> and return some representation of that XML with unicode data. That's why it's actually quite ok for XML parsers to
only accept string data.

Parsing usually means rebuilding the structure from a text string and *NOT*
encoding guessing or Unicode decoding.

Therefore, it is actually quite stupid for a parser
to try to encode an already decoded string (i.e. a Unicode string)
only that it can guess the encoding ;-)
A halfway intelligent parser would accept Unicode when it gets it
and concentrate on the remaining part of its task: either reporting
structural events or building a parse tree.

Yes, I agree. Unfortunately, expat isn't smart enough, which caused this whole discussion.

