-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Andreas Jung wrote: > > --On 14. Januar 2007 18:14:45 +0000 Chris Withers <[EMAIL PROTECTED]> > wrote: > >> Dieter Maurer wrote: >>> A halfway intelligent parser would accept Unicode when it gets it >>> and concentrate on the remaining part of its task: either reporting >>> structural events or building a parse tree. >> The trivial fix I use in Twiddler is as follows: >> >> if isinstance(source,unicode): >> source = source.encode('utf-8') >> >> Of course, this assumes a heading of either <?xml version="1.0" >> encoding="utf-8"?> or a missing encoding attribute, in which case the xml >> spec states that the string must be utf-8 encoded. > > The encoding of the XML preamble should not matter when parsing a XML > document stored as unicode string.
That encoding is a *lie*, which is the real problem. Parsers expect it to be *correct*, and if missing, expect the text to be encoded as UTF-8, per the spec (if the document comes from an HTTP request, then the application may supply the encoding from the request headers). Nothing in the XML specs allows or specifies and behavior for XML documents serialized as unicode, becuase such serializations are *programming language specific*. > It is of importance as soon as you > convert the document back to a stream e.g. when we deliver the content > back to a browser or a FTP client. The ZPublisher (for Zope 2) deals with > that by changing the encoding parameter of the preamble for XML documents > based on the desired output encoding. utf-8 is always a good choice however > other encodings like iso-8859-15 might raise UnicodeDecodeErrors. The Zope 2 > publisher "avoids" this problem converting the unicode result using > errors='replace' (which is likely something we might discuss :-)) Unicode XML is not only problematic for streaming. For instance, you *can't* pass a Unicode string to the libxml2 *at all* , unless you want a core dump. The API requires that you pass it strings encoded as UTF8. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 [EMAIL PROTECTED] Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v184.108.40.206 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFq9wf+gerLs4ltQ4RAvBkAKCGZke7HHr7vWQKcwn5IHW93GHlFQCgyXMJ a+vZYi2VRnZTt1XBt7O6U3Y= =+i3B -----END PGP SIGNATURE----- _______________________________________________ Zope3-dev mailing list Zope3email@example.com Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com