azurIt wrote at 2012-2-20 14:09 +0100:
> ...
>How i find this out:
>I'm using 'unicodedata.normalize' in one of my API function and it _requires_ 
>that second argument is unicode and not str. While rpclib was always returning 
>unicode i just passes it to to normalize(). This stopped to work after lxml 
>upgrade cos suddenly there were situations in which i got str instead of 
>unicode. The fix was, of course, very easy but i don't consider this a good 
>behavior - one really can't except that value is 'unicode' in one situation 
>and 'str' in anoher. It should be always 'str' or always 'unicode'.

What you see is the effect of wishing to be nondisruptive with the
Python past.

Unlike Java, Python is quite an old language and it did not anticipate
that in its future the world would adopt the "text is unicode" paradigm.
Thus, it worked with encoded strings ("str") to represent text.

We, and the Python developpers, now know that this is nowadays
a handicap -- and that's the primary reason for Python 3, the
backward incompatible new branch for Python development.

In Python 2, one wanted to support a transition towards
the modern paradigm but avoid as much disruption as possible.
This has caused complex implementations (such as "return str for
ascii only XML text") that often work but sometimes can have
surprising effects.


While Python usually treats unicode and "defaultencoding" encoded str
as equivalent, there a some important exceptions:

   *  the C interface supports either "str" or "unicode"
      but usually not both.

      I expect that this applies to "unicodedata.normalize".

   *  hash codes for "str" and "unicode" differ (at least
      outside the ASCII code page). Therefore mixing
      "str" and "unicode" keys for dictionaries can give
      surprising results.



--
Dieter
_______________________________________________
Soap mailing list
[email protected]
http://mail.python.org/mailman/listinfo/soap

Reply via email to