Scott David Daniels schrieb: > To discover what is happening, try something like: > python -c 'for a in "ä", unicode("ä"): print len(a), a' > > I suspect that in your encoding, "ä" is two bytes long, and in > unicode it is converted to to a single character.
:> python -c 'for a in "ä", unicode("ä", "utf8"): print len(a), a' 2 ä 1 ä :> Yes it is. That is one of the two problems I see. The solution for this is to unicode(<string>, <coding>) each string. I'd like to have my python programs unicode enabled. :> python -c 'for a in "ä", unicode("ä"): print len(a), a' Traceback (most recent call last): File "<string>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128) It seems that the default encoding is "ascii", so unicode() cannot cope with "ä". If I specify "utf8" for the encoding, unicode() works. :> python -c 'for a in "ä", unicode("ä", "utf8"): print len(a), a' 2 ä 1 ä :> But the print statement yelds an UnicodeEncodeError if I pipe the output to a program or a file. :> python -c 'for a in "ä", unicode("ä", "utf8"): print len(a), a' | cat Traceback (most recent call last): File "<string>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 0: ordinal not in range(128) 2 ä 1 :> So it seems to me, that piping the output changes the behavior of the print statement: :> python -c 'for a in "ä", unicode("ä", "utf8", "ignore"): print a, len(a), type(a)' ä 2 <type 'str'> ä 1 <type 'unicode'> :> python -c 'for a in "ä", unicode("ä", "utf8", "ignore"): print a, len(a), type(a)' | cat Traceback (most recent call last): File "<string>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 0: ordinal not in range(128) ä 2 <type 'str'> :> How can I achieve that my python programs are unicode enabled: - Input strings can have different encodings (mostly ascii, latin_1 or utf8) - My python programs should always output "utf8". Is that a good idea?? TIA -- Kurt Müller, m...@problemlos.ch -- http://mail.python.org/mailman/listinfo/python-list