--On 15. Januar 2007 22:15:46 +0100 Martijn Faassen <[EMAIL PROTECTED]> wrote:
My point is that: u"<?xml version="1.0" encoding="ISO-8859-1"?><foo>Some non-ascii text</foo>" is confusing at best. One part of this says it's a unicode string, the other part says it's in encoding latin-1.
The string above would be used for internal storage but *not* for processing. Btw. this is not different from storing HTML files as unicode string. An application must convert the unicode string back to a serialized string - either to the encoding as specified inside the preamble or to a 'general' encoding (that covers the unicode database) like utf-8 with changing the encoding inside the preamble - both are legitimate approaches.
There is no ambiguity. A smart XML parser will represent a XML document *independent* of the source encoding in most general way (storing a textual content a unicode (or utf-8 at least).
I still don't see what should ambiguous with this approach.Ambiguous in that the string seems to say it's in two encodings at once. You're then "guessing": you're letting the Python string type trump the declaration. Then, since we've shown that leads to bugs, you propose actually change the encoding declaration of the XML document. I wonder what people then expect to happen upon serialization. In effect, your proposal would, I think, serialize to UTF-8 only, right? (in which case the encoding declaration can be dropped as it's the default.
When you download a ZPT through FTP/WebDAV then the unicode representation of the XML will be converted using the 'output_encoding' property of the corresponding ZPT which is set when uploading a new XML document (and takenfrom the premable). So when you upload an latin1 XML file you should get it back as valid latin1 through FTP/WebDAV.
When you download text/xml content through the ZPublisher then the ZPublisher will convert unicode textual content to some encoding which is
either taken from an already set 'content-type: text/...; charset=XXXXX' HTTP Header or as fallback from the zpublisher-default-encoding property as defined in the zope.conf file. So the application can specify in both case the encoding of the serialized XML content. Where is the problem? Andreas
Description: PGP signature
_______________________________________________ Zope3-dev mailing list Zope3email@example.com Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com