Jeff Shell schrieb:
[..]

I feel like I know enough to squeak by, but that's no longer
acceptable. Sometimes I quiver in terror, waiting for everything to
fall down because of something so seemingly basic like strings/text.
It may be a lot of technical debt, or it may be extremely easy to pay
down. In any case, it's time to pay it down. :)


In addition to what has already been said (and maybe not really
what you are asking for), the number one problem people encounter
can be illustrated within a simple interactive Python session
like this one:

[EMAIL PROTECTED] ~]$ python244
Python 2.4.4 (#1, Jan 29 2007, 13:00:46)
[GCC 4.1.1 20060525 (Red Hat 4.1.1-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s1 = u"I'm unicode"
>>> s2 = "I'm a byte string"
>>> s1 + s2
u"I'm unicodeI'm a byte string"
>>> s1 = u"I'm unicode äöü"
>>> s1
u"I'm unicode \xe4\xf6\xfc"
>>> s2 = "I'm a byte string äöü"
>>> s2
"I'm a byte string \xc3\xa4\xc3\xb6\xc3\xbc"
>>> s1 + s2
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in range(128)
>>> s1 + s2.decode('utf-8')
u"I'm unicode \xe4\xf6\xfcI'm a byte string \xe4\xf6\xfc"
>>>

It's Python's implicit casting of byte strings to unicode strings
the moment it has to deal with both together with Python's default
encoding being 'ascii' (don't ask why - it's been a long discussion
at the time ...).

The moment you get the strings involved from some application
call or third-party code you often don't really know what you
will get, so unless you do extensive checks and explicit
decoding using the right codec you'll never be on the safe side.

Within Zope 3, however, this should be a non-issue as Zope 3
(including Zope 3-based applications) should only use unicode
strings.

Just my 2 cents.

Raphael





_______________________________________________
Zope3-users mailing list
Zope3-users@zope.org
http://mail.zope.org/mailman/listinfo/zope3-users

Reply via email to