yuppie wrote:
Hi Yves!
Yves Bastide wrote:
GenericSetup has problems handling non-ASCII data.
1.) GenericSetup explicitly doesn't support non-UTF-8 XML in profiles.
UTF-8 is the default encoding for XML and I can't see a need to support
other XML encodings.
2.) GenericSetup explicitly doesn't support non-UTF-8 site settings. If
someone provides a good patch this feature can be added.
3.) GenericSetup is not tested with non-ASCII UTF-8 site settings. AFAIK
import works, but not export. I consider this a bug.
It treats strings sometimes as ASCII, sometimes as UTF-8, yet it has
access to two variables: its own ISetupContext.getEncoding() (whose
use I didn't fully grok) and CMF's
ISetupContext.getSite().getProperty('default_charset').
Sorry, but your assumptions are wrong:
- The default setup tool creates export contexts without specifying the
encoding, so ISetupContext.getEncoding() returns always None. And even
if it would be set it represents the encoding of the exported files, not
the site encoding.
- getSite().getProperty('default_charset') is CMF specific and should
not be used in GenericSetup.
- The adapters adapt ISetupEnviron, not ISetupContext. getEncoding() and
getSite() are not always available.
Attached is a patch using both of them and somewhat working in my
setup. Can knowledgeable people comment on it before I enter a
collector issue? (I'm using GS alongside with CPS, which also needs
some patching; yet basic things, such as exporting-importing an
iso8859-15 Title in a CMF charset-default'ed to iso8859-15, should work)
First of all we need unit tests that make sure UTF-8 works and I think
this should be the default used by GenericSetup. Code that needs to know
how to find the site encoding can't be generic.
There is an additional problem: If tools use the default property edit
page from OFS the properties might have a different encoding than
'default_charset' of the site. Since the default
'management_page_charset' is UTF-8 we have less trouble if we allow only
UTF-8.
Let's not forget also that the goal in CMF 2 (I think) is to have all
content be unicode strings, never encoded ones. In that case GenericSetup
only has to deal with the XML file's encoding (always UTF-8 anyway) but
that's all.
CPS 3 was a pure-latin1 application for various historical reasons, so we
modified a number of I/O adapters so that they encode/decode properly what
GenericSetup works with. CPS 3.4 tends to remove the hardcoding of latin-1
to the site's default_charset property, but that's not been taken into
account everywhere, although it should.
CPS 4 will be purely unicode, and won't need all that shit.
Florent
--
Florent Guillaume, Nuxeo (Paris, France) Director of R&D
+33 1 40 33 71 59 http://nuxeo.com [EMAIL PROTECTED]
_______________________________________________
Zope-CMF maillist - [email protected]
http://mail.zope.org/mailman/listinfo/zope-cmf
See http://collector.zope.org/CMF for bug reports and feature requests