Burak Arslan wrote: > [...] > Considering all this, my decision is to separate String and > Unicode for Python 2, as; > > Explicit is better than implicit. > > If you disagree, > speak now or forever hold your silence :)) >
I took the liberty of checking this on the lxml mailing list - hope you don't mind. Here are the relevant sections of the response - 1. > > With Python 2 and version 2.2.2 (I don't have 2.3.3), if you pass a unicode > > string that contains a non-ASCII character, you get a unicode string back. > > If you pass a unicode string that contains only ASCII characters, you get a > > normal string back. > > This behaviour is causing a problem to a user of rpclib. > It shouldn't normally. Apparently, the problem was that the user passed the > result into unicodedata.normalize(), which rejected Py2-str as input. > Sounds like a bug in the unicodedata module to me, since str is supposed to > auto-decode into Unicode automatically on Py2. The response also had a link to a more detailed discussion, which explained that lxml does it that way to maintain compatibility with ElementTree. 2. > > so the proposal is that rpclib should always convert the string to unicode > > before returning it. > It's perfectly ok, they can just wrap it in unicode() when running in Py2, > or concatenate it with the empty unicode string. All that will change is > the type of the object (well, and its memory consumption and the time it > takes to build it, but I don't think that matters here). 3. > > With Python 3 and version 2.3.3, if you pass a unicode string to > > etree.fromstring(...), and then retrieve a text node from the tree, you get > > a unicode string back. If you pass in a byte array, you get a byte array > > back. > No, you always get a Unicode string for names and text in Python 3, > regardless of what you used for parsing (or tree building in general). I checked, and this is correct, so what I posted earlier was wrong. Sorry for the misinformation. Frank Millman _______________________________________________ Soap mailing list [email protected] http://mail.python.org/mailman/listinfo/soap
