On Tue, 11 Nov 2008 12:18:26 -0800, Thierry wrote: > I have realized an wxPython simple application, that takes the input of > a user, send it to a web service, and get back translations in several > languages. > The service itself is fully UTF-8. > > The "source" string is first encoded to "latin1" after a passage into > unicode.normalize(), as urllib.quote() cannot work on unicode >>>srcText=unicodedata.normalize('NFKD',srcText).encode('latin1','ignore')
If the service uses UTF-8 why don't you just encode the data you send as UTF-8 but Latin-1 with potentially throwing away data because of the 'ignore' argument!? Make that ``src_text = unicodedata.encode('utf-8')`` >>>req=urllib2.urlopen(con) > > First problem, how to determine the encoding of the return ? If I > inspect a request from firefox, I see that the server return header > specify UTF-8 > But if I use this code: >>>ret=U'' >>>for line in req: >>> ret=ret+string.replace(line.strip(),'\n',chr(10)) > I end up with an UnicodeDecodeError. Because `line` contains bytes and `ret` is a `unicode` object. If you add a `unicode` object and a `str` object, Python tries to convert the `str` to `unicode` using the default == ASCII encoding. And this fails if there are byte value >127. *You* have to decode `line` from a bunch of bytes to a bunch of (unicode)characters before you concatenate the strings. BTW: ``line.strip()`` removes all whitespace at both ends *including newlines*, so there are no '\n' to replace anymore. And functions in the `string` module that are also implemented as method on `str` or `unicode` are deprecated. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list