Hi all, I have a bit of a problem. I'm trying to use Python to work with some data which turns out to be garbage. Ultimately, I think the solution will be to .decode('utf-8') a string twice, but Python doesn't like doing this the second time. That could possibly be understandable, but then why does the unicode object have a .decode() method at all?
I get 'WVL Algemeen Altru\xc3\x83\xc2\xafsme genormeerd Afbeelden' at first. I .decode('utf-8') this to u'WVL Algemeen Altru\xc3\xafsme genormeerd Afbeelden'. I then try to .decode('utf-8') this again, but that gives an error: Traceback (most recent call last): File "<stdin>", line 1, in ? File "C:\Program Files\Python\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-19: ordinal not in range(128) If I copy/paste 'WVL Algemeen Altru\xc3\xafsme genormeerd Afbeelden' and try to .decode('utf-8') it, that works fine, and it gets me the result I want, which is u'WVL Algemeen Altru\xefsme genormeerd Afbeelden'. Why does it work this way? How can I make it work? Regards, Manuzhai -- http://mail.python.org/mailman/listinfo/python-list