7stud wrote: > Can anyone tell me why I can print out the individual variables in the > following code, but when I print them out combined into a single > string, I get an error? > > symbol = u'ibm' > price = u'4 \xbd' # 4 1/2 > > print "%s" % symbol > print "%s" % price.encode("utf-8") > print "%s %s" % (symbol, price.encode("utf-8") ) > > --output:-- > ibm > 4 1/2 > File "pythontest.py", line 6, in ? > print "%s %s" % (symbol, price.encode("utf-8") ) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position > 2: ordinal not in range(128)
For format % args, if the format or any arg is a unicode string, the result will be unicode, too. This implies that byte strings have to be decoded, and for that process the default ascii codec is used. In your example > print "%s %s" % (symbol, price.encode("utf-8") ) symbol is a unicode, so python tries to decode "%s %s" and "4 \xc2\xbd" (the result of price.encode("utf8")). The latter contains non-ascii chars and fails. Solution: use unicode throughout and let the print statement do the encoding. >>> symbol = u"ibm" >>> price = u"4 \xbd" >>> print u"%s %s" % (symbol, price) ibm 4 ? Sometimes, e. g. if you redirect stdout, the above can fail. Here's a workaround that uses utf8 in such cases. import sys if sys.stdout.encoding is None: import codecs sys.stdout = codecs.lookup("utf8").streamwriter(sys.stdout) Peter -- http://mail.python.org/mailman/listinfo/python-list