Bjorn Stabell wrote:

* gets charset from manage_page_charset (same as ZMI), but can be overridden
* stores field values as encoded text (not Unicode), but lets you specify
which encoding to use
  (confusingly calls this "unicode" mode)
* messages are stored as UTF-8 (hardcoded)

While there is no question about the confusingness of the user interface of Formulator pertaining unicode, most of this is not correct (unless there are bugs I don't know about).

Formulator has two modes; unicode mode and 'classic mode'. In unicode mode, all strings are stored as Python unicode strings. In classic mode, all strings are stored in 'whatever encoding the user is using'. It's possible to convert from one mode to another, and for this switching behavior an encoding to use can be specified. In unicode mode, that encoding is ignored, however.

Classic mode basically exists so as not to break all Formulator forms already in existence. This complicated the design significantly, but I thought this was important.

Quite independently from this, fields can also be configured to *deliver* unicode upon validation/conversion. The character set is specified of the page that the form is in can be specified in the form settings.

I suggest this way of dealing with Unicode right now in Zope 2:

General note: this way sounds good to me, but I know from hard experience how difficult it is to convert an existing application to fully unicode.

(1) Let ZPublisher do the encoding/decoding of form input and HTML output:

a. Always set a character encoding in a HTTP Content-type request

Silva does this (and Formulator too).

  b. Always append :ustring/utext/ulines/utokens:ENCODING to field names of
fields that support Unicode
      (we may need some library code to make this easier)

Formulator won't be able to do 'b' very easily. It'll do its own converting to unicode though for fields that want this.

(2) Store Unicode strings directly in the ZODB.  The ZODB is perfectly
capable of storing strings in Python's internal Unicode format; no need to
encode the text to UTF-8 or some other encoding.

Silva has been doing this fully since version 0.9.2, released in the summer of last year. Formulator took a while longer to catch up (before it would only interoperate if the form titles etc were only ascii), but is now a first class citizen in a Zope/unicode environment. Its XML serialization is UTF-8 in this mode.

(3) Encode/decode yourself when reading from/ writing to other external data
sources such as files and other databases.  Do it just before you write, or
just after you read, so that as much code as possible can be
encoding-agnostic.  Keep the encoding/decoding as close to the "source data"
as possible.   The best way to do it is (in most cases) to specify the
encoding on the IO stream, and let Python do the encoding/decoding for you
transparently.  If possible, get the encoding from the external data source
(e.g., the file) instead of relying on a magical global variable.  If you
have to rely on a global variable, let it be manage_page_charset.

(4) [This is really just advice...] Resist patching your code to work with
components that doesn't deal with Unicode.  Others are likely having the
same problem, so to avoid ending up with lots of ugly patches (that are the
source of mysterious Unicode problems), fix the problem at its source: the
other component.  It's really not that difficult to fix (if we agree on how
it should be fixed ;)

It's actually quite difficult to fix if you care about backwards compatibility. Fixing Formulator was quite complicated. You're definitely making this sound far easier than it is. It's a good thing to do, Silva has it, but the words 'not that difficult' don't fit in this debate.

None of the above components handles Unicode in this way, but it seems to be
how the Unicode support in Zope 2 was meant to be used.

You're actually wrong about Formulator. :)



Zope-Dev maillist - [EMAIL PROTECTED]
** No cross posts or HTML encoding! **
(Related lists - )

Reply via email to