Andreas Jung wrote:

--On 15. Januar 2007 15:44:01 +0100 Martijn Faassen <[EMAIL PROTECTED]> wrote:
On 1/15/07, Andreas Jung <[EMAIL PROTECTED]> wrote:
ok, got it. But this problem can be solved easily by changing the
encoding within the preamble.

I would say refusing to guess and bailing out with an error message is
better in this case. The Zen of Python:

In the face of ambiguity, refuse the temptation to guess.

Sorry but I don't get your point. What's happening with a XML inside a ZPT?

My point is that:

u"<?xml version="1.0" encoding="ISO-8859-1"?><foo>Some non-ascii text</foo>"

is confusing at best. One part of this says it's a unicode string, the other part says it's in encoding latin-1. What is it? What happens to this if you recode this to, say, UTF-8? What happens to this if you parse and *then* serialize it? What does the developer expect will happen? What do users expect when they enter XML in a form and include an encoding declaration?

I proposed we make nobody worry about this by simply not accepting this.

- XML data encoded as XXX comes in (either by editing the XML file through
  the ZMI or FTP/WebDAV upload)

- ZPT converts the encoded string to unicode based on the encoding in the preamble

- for parsing it is up to the application to decide what to do with the data. It is not up to the editor to decide how the ZPT engine should deal with XML internally. The ZPT engine decides to serializes the unicode string as utf-8 and to fix the XML preamble (which will result in a valid XML file which should identical with the original file - except the encoding might be different).

I still don't see what should ambiguous with this approach.

Ambiguous in that the string seems to say it's in two encodings at once. You're then "guessing": you're letting the Python string type trump the declaration. Then, since we've shown that leads to bugs, you propose actually change the encoding declaration of the XML document. I wonder what people then expect to happen upon serialization. In effect, your proposal would, I think, serialize to UTF-8 only, right? (in which case the encoding declaration can be dropped as it's the default)



Zope3-dev mailing list

Reply via email to