Based on this example and the error: ----- u_str = u"abc\u9999" print u_str
UnicodeEncodeError: 'ascii' codec can't encode character u'\u9999' in position 3: ordinal not in range(128) ------ it looks like when I try to display the string, the ascii decoder parses each character in the string and fails when it can't convert a numerical code that is higher than 127 to a character, i.e. the character \u9999. In the following example, I use encode() to convert a unicode string to a regular string: ----- u_str = u"abc\u9999" reg_str = u_str.encode("utf-8") print repr(reg_str) ----- and the output is: 'abc\xe9\xa6\x99' 1) Why aren't the characters 'a', 'b', and 'c' in hex notation? It looks like python must be using the ascii decoder to parse the characters in the string again--with the result being python converts only the 1 byte numerical codes to characters. 2) Why didn't that cause an error like above for the 3 byte character? Then if I try this: --- u_str = u"abc\u9999" reg_str = u_str.encode("utf-8") print reg_str --- I get the output: abc<some chinese character> Here it looks like python isn't using the ascii decoder anymore. 2) What determines which decoder python uses? -- http://mail.python.org/mailman/listinfo/python-list