Andreas Jung wrote:
--On 15. Januar 2007 15:44:01 +0100 Martijn Faassen
<[EMAIL PROTECTED]> wrote:
On 1/15/07, Andreas Jung <[EMAIL PROTECTED]> wrote:
ok, got it. But this problem can be solved easily by changing the
encoding within the preamble.
I would say refusing to guess and bailing out with an error message is
better in this case. The Zen of Python:
In the face of ambiguity, refuse the temptation to guess.
Sorry but I don't get your point. What's happening with a XML inside a ZPT?
My point is that:
u"<?xml version="1.0" encoding="ISO-8859-1"?><foo>Some non-ascii text</foo>"
is confusing at best. One part of this says it's a unicode string, the
other part says it's in encoding latin-1. What is it? What happens to
this if you recode this to, say, UTF-8? What happens to this if you
parse and *then* serialize it? What does the developer expect will
happen? What do users expect when they enter XML in a form and include
an encoding declaration?
I proposed we make nobody worry about this by simply not accepting this.
- XML data encoded as XXX comes in (either by editing the XML file through
the ZMI or FTP/WebDAV upload)
- ZPT converts the encoded string to unicode based on the encoding in
- for parsing it is up to the application to decide what to do with the
data. It is not up to the editor to decide how the ZPT engine should
deal with XML internally. The ZPT engine decides to serializes the
unicode string as utf-8 and to fix the XML preamble (which will result
in a valid XML file
which should identical with the original file - except the encoding
might be different).
I still don't see what should ambiguous with this approach.
Ambiguous in that the string seems to say it's in two encodings at once.
You're then "guessing": you're letting the Python string type trump the
declaration. Then, since we've shown that leads to bugs, you propose
actually change the encoding declaration of the XML document. I wonder
what people then expect to happen upon serialization. In effect, your
proposal would, I think, serialize to UTF-8 only, right? (in which case
the encoding declaration can be dropped as it's the default)
Zope3-dev mailing list