--On 15. Januar 2007 22:15:46 +0100 Martijn Faassen <[EMAIL PROTECTED]> wrote:

My point is that:

u"<?xml version="1.0" encoding="ISO-8859-1"?><foo>Some non-ascii

is confusing at best. One part of this says it's a unicode string, the
other part says it's in encoding latin-1.

The string above would be used for internal storage but *not* for processing. Btw. this is not different from storing HTML files as unicode string. An application must convert the unicode string back to a serialized string - either to the encoding as specified inside the preamble or to a 'general' encoding (that covers the unicode database) like utf-8 with changing the encoding inside the preamble - both are legitimate approaches.
There is no ambiguity. A smart XML parser will represent a XML document
*independent* of the source encoding in most general way (storing a textual
content a unicode (or utf-8 at least).

I still don't see what should ambiguous with this approach.

Ambiguous in that the string seems to say it's in two encodings at once.
You're then "guessing": you're letting the Python string type trump the
declaration. Then, since we've shown that leads to bugs, you propose
actually change the encoding declaration of the XML document. I wonder
what people then expect to happen upon serialization. In effect, your
proposal would, I think, serialize to UTF-8 only, right? (in which case
the encoding declaration can be dropped as it's the default.

When you download a ZPT through FTP/WebDAV then the unicode representation
of the XML will be converted using the 'output_encoding' property of the
corresponding ZPT which is set when uploading a new XML document (and taken
from the premable). So when you upload an latin1 XML file you should get it back as valid latin1 through FTP/WebDAV.

When you download text/xml content through the ZPublisher then the ZPublisher will convert unicode textual content to some encoding which is
either taken from an already set 'content-type: text/...; charset=XXXXX'
HTTP Header or as fallback from the zpublisher-default-encoding property
as defined in the zope.conf file.

So the application can specify in both case the encoding of the serialized
XML content. Where is the problem?


Attachment: pgpUMJ3Mc5Oh4.pgp
Description: PGP signature

Zope3-dev mailing list
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com

Reply via email to