--On 14. Januar 2007 18:14:45 +0000 Chris Withers <[EMAIL PROTECTED]> wrote:

Dieter Maurer wrote:
A halfway intelligent parser would accept Unicode when it gets it
and concentrate on the remaining part of its task: either reporting
structural events or building a parse tree.

The trivial fix I use in Twiddler is as follows:

if isinstance(source,unicode):
   source = source.encode('utf-8')

Of course, this assumes a heading of either <?xml version="1.0"
encoding="utf-8"?> or a missing encoding attribute, in which case the xml
spec states that the string must be utf-8 encoded.

The encoding of the XML preamble should not matter when parsing a XML
document stored as unicode string. It is of importance as soon as you convert the document back to a stream e.g. when we deliver the content back to a browser or a FTP client. The ZPublisher (for Zope 2) deals with that by changing the encoding parameter of the preamble for XML documents based on the desired output encoding. utf-8 is always a good choice however
other encodings like iso-8859-15 might raise UnicodeDecodeErrors. The Zope 2
publisher "avoids" this problem converting the unicode result using errors='replace' (which is likely something we might discuss :-))


Attachment: pgpY2ic9Zojnl.pgp
Description: PGP signature

Zope3-dev mailing list
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com

Reply via email to