Re: [UnicodeEncodeError] Don't know what else to try
On Sat, 15 Nov 2008 14:12:42 +0100, Gilles Ganault wrote: > On Fri, 14 Nov 2008 17:39:00 +0100, "Martin v. Löwis" > <[EMAIL PROTECTED]> wrote: >>Can you first please report what happened when you add the print >>statement? > > Thanks guys, I found how to handle this: > > === > for id in rows: > #Says Unicode, but it's actually not > #print type(id[1]) > # If it says `unicode` *it is* `unicode`. > try: > print id[1]; > except UnicodeEncodeError: > print "Not unicode" But it *is* `unicode` if `type()` says so! Your code still fails when ``id[1]`` can't be encoded in `sys.encoding`, 'iso8859-15', or 'cp1252'. Even worse: The output may be even encoded in different encodings this way. That's garbage you can't decode properly with one encoding anymore. A clean solution would be just one ``print`` with a call of `encode()` and an explicit encoding. I'd use 'utf-8' as default but give the user of the program a possibility to make another choice. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: [UnicodeEncodeError] Don't know what else to try
On Fri, 14 Nov 2008 17:39:00 +0100, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: >Can you first please report what happened when you add the print statement? Thanks guys, I found how to handle this: === for id in rows: #Says Unicode, but it's actually not #print type(id[1]) # try: print id[1]; except UnicodeEncodeError: print "Not unicode" try: print id[1].encode('iso8859-15') print "iso" except UnicodeEncodeError: print id[1].encode('cp1252') print "Windows" === Thank you. -- http://mail.python.org/mailman/listinfo/python-list
Re: [UnicodeEncodeError] Don't know what else to try
Gilles Ganault wrote: > On Fri, 14 Nov 2008 11:01:27 +0100, "Martin v. Löwis" > <[EMAIL PROTECTED]> wrote: >> Add >>print type(output) >> here. If it says "unicode", reconsider the next line >> >>> print output.decode('utf-8') > > In case the string fetched from a web page turns out not to be Unicode > and Python isn't happy, what is the right way to handle this, know > what codepage is being used? Can you first please report what happened when you add the print statement? Thanks, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: [UnicodeEncodeError] Don't know what else to try
On Fri, 14 Nov 2008 14:57:42 +0100, Gilles Ganault wrote: > On Fri, 14 Nov 2008 11:01:27 +0100, "Martin v. Löwis" > <[EMAIL PROTECTED]> wrote: >>Add >>print type(output) >>here. If it says "unicode", reconsider the next line >> >>> print output.decode('utf-8') > > In case the string fetched from a web page turns out not to be Unicode > and Python isn't happy, what is the right way to handle this, know what > codepage is being used? How do you fetch the data? If you simply download it with `urllib` or `urllib` you never get `unicode` but ordinary `str`\s. The you have to figure out the encoding by looking at the headers from the server and/or looking at the fetched data if it contains hints. And when ``print``\ing you should explicitly *encode* the data again because sooner or later you will come across a `stdout` where Python can't determine what the process at the other end expects, for example if output is redirected to a file. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: [UnicodeEncodeError] Don't know what else to try
On Fri, 14 Nov 2008 11:01:27 +0100, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: >Add >print type(output) >here. If it says "unicode", reconsider the next line > >> print output.decode('utf-8') In case the string fetched from a web page turns out not to be Unicode and Python isn't happy, what is the right way to handle this, know what codepage is being used? Thank you. -- http://mail.python.org/mailman/listinfo/python-list
Re: [UnicodeEncodeError] Don't know what else to try
> print output.decode('utf-8') > File "C:\Python25\lib\encodings\utf_8.py", line 16, in decode > return codecs.utf_8_decode(input, errors, True) > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe2' in > position 47: > ordinal not in range(128) Notice that it complains about the 'ascii' codec, when you were using the utf-8 codec. Also, it complains about *en*coding, when you try decoding. For this, there could be two possible causes: 1. output is already a Unicode object - decoding it is not a meaningful operation. So Python first *en*codes it with ascii, then would decode it with UTF-8. The first step fails. 2. decoding from utf-8 works fine. It then tries to print output, which requires an encoding. By default, it encodes as ascii, which fails. > > > Here's the code: > > try: > output = "Different: (%s) %s : %s # %s" % > (id[2],id[3],id[0],id[1]) Add print type(output) here. If it says "unicode", reconsider the next line > print output.decode('utf-8') Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list