Re: [UnicodeEncodeError] Don't know what else to try

2008-11-15 Thread Marc 'BlackJack' Rintsch
On Sat, 15 Nov 2008 14:12:42 +0100, Gilles Ganault wrote:

> On Fri, 14 Nov 2008 17:39:00 +0100, "Martin v. Löwis"
> <[EMAIL PROTECTED]> wrote:
>>Can you first please report what happened when you add the print
>>statement?
> 
> Thanks guys, I found how to handle this:
> 
> ===
> for id in rows:
>   #Says Unicode, but it's actually not
>   #print type(id[1])
>   #

If it says `unicode` *it is* `unicode`.

>   try:
>   print id[1];
>   except UnicodeEncodeError:
>   print "Not unicode"

But it *is* `unicode` if `type()` says so!

Your code still fails when ``id[1]`` can't be encoded in `sys.encoding`, 
'iso8859-15', or 'cp1252'.  Even worse: The output may be even encoded in 
different encodings this way.  That's garbage you can't decode properly 
with one encoding anymore.

A clean solution would be just one ``print`` with a call of `encode()` 
and an explicit encoding.  I'd use 'utf-8' as default but give the user 
of the program a possibility to make another choice.

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list


Re: [UnicodeEncodeError] Don't know what else to try

2008-11-15 Thread Gilles Ganault
On Fri, 14 Nov 2008 17:39:00 +0100, "Martin v. Löwis"
<[EMAIL PROTECTED]> wrote:
>Can you first please report what happened when you add the print statement?

Thanks guys, I found how to handle this:

===
for id in rows:
#Says Unicode, but it's actually not
#print type(id[1])
#

try:
print id[1];
except UnicodeEncodeError:
print "Not unicode"
try:
print id[1].encode('iso8859-15')
print "iso"
except UnicodeEncodeError:
print id[1].encode('cp1252')
print "Windows"
===

Thank you.
--
http://mail.python.org/mailman/listinfo/python-list


Re: [UnicodeEncodeError] Don't know what else to try

2008-11-14 Thread Martin v. Löwis
Gilles Ganault wrote:
> On Fri, 14 Nov 2008 11:01:27 +0100, "Martin v. Löwis"
> <[EMAIL PROTECTED]> wrote:
>> Add
>>print type(output)
>> here. If it says "unicode", reconsider the next line
>>
>>> print output.decode('utf-8')
> 
> In case the string fetched from a web page turns out not to be Unicode
> and Python isn't happy, what is the right way to handle this, know
> what codepage is being used?

Can you first please report what happened when you add the print statement?

Thanks,
Martin
--
http://mail.python.org/mailman/listinfo/python-list


Re: [UnicodeEncodeError] Don't know what else to try

2008-11-14 Thread Marc 'BlackJack' Rintsch
On Fri, 14 Nov 2008 14:57:42 +0100, Gilles Ganault wrote:

> On Fri, 14 Nov 2008 11:01:27 +0100, "Martin v. Löwis"
> <[EMAIL PROTECTED]> wrote:
>>Add
>>print type(output)
>>here. If it says "unicode", reconsider the next line
>>
>>> print output.decode('utf-8')
> 
> In case the string fetched from a web page turns out not to be Unicode
> and Python isn't happy, what is the right way to handle this, know what
> codepage is being used?

How do you fetch the data?  If you simply download it with `urllib` or 
`urllib` you never get `unicode` but ordinary `str`\s.  The you have to 
figure out the encoding by looking at the headers from the server and/or 
looking at the fetched data if it contains hints.

And when ``print``\ing you should explicitly *encode* the data again 
because sooner or later you will come across a `stdout` where Python 
can't determine what the process at the other end expects, for example if 
output is redirected to a file.

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list


Re: [UnicodeEncodeError] Don't know what else to try

2008-11-14 Thread Gilles Ganault
On Fri, 14 Nov 2008 11:01:27 +0100, "Martin v. Löwis"
<[EMAIL PROTECTED]> wrote:
>Add
>print type(output)
>here. If it says "unicode", reconsider the next line
>
>>  print output.decode('utf-8')

In case the string fetched from a web page turns out not to be Unicode
and Python isn't happy, what is the right way to handle this, know
what codepage is being used?

Thank you.
--
http://mail.python.org/mailman/listinfo/python-list


Re: [UnicodeEncodeError] Don't know what else to try

2008-11-14 Thread Martin v. Löwis
> print output.decode('utf-8')
>   File "C:\Python25\lib\encodings\utf_8.py", line 16, in decode
> return codecs.utf_8_decode(input, errors, True)
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe2' in
> position 47:
>  ordinal not in range(128)

Notice that it complains about the 'ascii' codec, when you were using
the utf-8 codec. Also, it complains about *en*coding, when you try
decoding.
For this, there could be two possible causes:

1. output is already a Unicode object - decoding it is not a meaningful
   operation. So Python first *en*codes it with ascii, then would decode
   it with UTF-8. The first step fails.
2. decoding from utf-8 works fine. It then tries to print output, which
   requires an encoding. By default, it encodes as ascii, which fails.

> 
> 
> Here's the code:
> 
>   try:
>   output = "Different: (%s) %s : %s # %s" %
> (id[2],id[3],id[0],id[1])

Add
print type(output)
here. If it says "unicode", reconsider the next line

>   print output.decode('utf-8')

Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list