On Thu, Apr 1, 2010 at 11:52 AM, Marius Gedminas <mar...@gedmin.as> wrote:
> I don't think I'll be able to work on it, but I think it's worth
> consideration: Unicode issues with Zope 2.12. I've seen these on at
> least three different Zope 2 sites built with a combination of TTW page
> templates, Python scripts and (sometimes) DTML documents: things like
> title attributes store their data as UTF-8 strings, while page templates
> insist on Unicode objects, resulting in errors all over the place.
> Those sites worked with Zope 2.9 and broke down after an upgrade to
> 2.12. That's a not very nice thing to do to your users...
They broke down after the move to Zope 2.10. We switched Zope 2 to
using the zope.tal / zope.tales packages in favor of Zope 2's own
implementation. As a result TAL uses Unicode internally ever since.
There's the whole unicoderesolver story, which allows you to implement
an application specific fallback story. We decided back then, that
dealing with this problem would be left to each application, as Zope 2
in general has too little knowledge about your data - and nobody
volunteered to do any work on it ;)
Plone has implemented a specific fallback story which automatically
converts all utf-8 encoded strings to Unicode. In the Plone 3.x series
it accepted all otherwise encoded strings and converted them via
unicode(text, 'utf-8', 'ignore'), logging such occurrences. In Plone 4
it throws an exception on any non-utf-8 non unicode data. In Plone 5
we'll probably log warnings for utf-8 encoded strings and push the
responsibility to convert to Unicode into the application code.
If you have a rather large application with third-party plugins and
have to deal with the encoded string to Unicode conversion, I think
such a long term upgrade story with policy changes happening around
major releases is the only way to go. If you have a pure-inhouse
application you can do a data and code conversion as single project
and get over with.
That being said, I'd like to see someone tackle the "id" / url
segments as Unicode problem. They are currently restricted to ASCII,
which means we don't have a problem with arbitrary encoded string
data. But there's probably enough places that rely on them being ASCII
in some way.
Zope-Dev maillist - Zope-Dev@zope.org
** No cross posts or HTML encoding! **
(Related lists -