On Feb 20, 2013, at 12:47 , Marius Gedminas <mar...@gedmin.as> wrote:
> On Wed, Feb 20, 2013 at 09:57:52AM +0100, Wichert Akkerman wrote:
>> I want to propose that we do the following:
>> Allow arbitrary str object ids in Zope 4 based on Dieter's work.
>> Configure standard name choosers, normalizers, etc. used to generate
>> ids for content to create UTF-8 object ids.
> *puts Python 3 porter hat on*
> Why UTF-8 byte strings instead of Unicode?
One subtle difference: I suggest byte strings which are set by default to UTF-8
encoded strings when they are generated, but can be anything.
I have several reasons to not want to use unicode. One reason is that I am
worried that this will break too much existing code; it is likely to require
much more invasive changes in both Zope itself and third party packages. A
second reason is that I feel that using unicode for an id is just wrong: an id
can be any binary thing and does not need to be text: consider for example the
unique ID for a user in active directory (ObjectGUID), or it can be text in a
script that unicode does not support
(http://www.unicode.org/standard/unsupported.html has a list), or it can come
from a source that uses an unknown encoding or multiple encodings at the same
time such as a filesystem or WebDAV users. For examples of the last see the
crazy hoops Python 3 has to jump through to expose files via a unicode API, and
even with all its magic tricks it still does not seem to work perfectly. The
vast majority of object ids are text, but the ability to support other types of
ids is extremely usef
ul. So my suggestion would be: use unicode for object titles, descriptions,
etc. but stick with byte strings for ids.
Zope-Dev maillist - Zope-Dev@zope.org
** No cross posts or HTML encoding! **
(Related lists -