Bugs item #1436532, was opened at 2006-02-22 10:45 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1436532&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: IDLE Group: Python 2.4 >Status: Closed >Resolution: Invalid Priority: 5 Submitted By: James (hover_boy) Assigned to: Martin v. Löwis (loewis) Summary: length of unicode string changes print behaviour Initial Comment: Python 2.4.2 and IDLE (with Courier New font) on XP and the following code saved as a UTF-8 file if __name__ == "__main__": print "é¶ ä¸ äº ä¸ å äº å ä¸ å «" print "é¶ ä¸ äº ä¸ å äº å ä¸ å « ä¹ å " results in... IDLE 1.1.2 >>> ================================ RESTART ================================ >>> éâºÂ¶ ä¸⬠亊ä¸Ⱐåâºâº äºâ Ã¥â¦Ã¤Â¸Æ Ã¥â¦Â« é¶ ä¸ äº ä¸ å äº å ä¸ å « ä¹ å >>> ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2006-07-23 21:42 Message: Logged In: YES user_id=21627 This is not a bug. The program should not attempt to print byte strings, since it cannot know what the encoding of the byte strings is. Instead, the program should use Unicode strings, such as print u"å «å «å «å «å «å «å «å «å «å «å «å «å «å «å «å «å «å «å «å «å «å «" If you attempt to print byte strings, they have to be in the encoding of stdout, or else the behaviour is unspecified. In my installation/locale, sys.stdout.encoding is cp1250. IDLE's OutputWindow.write has this code: # Tk assumes that byte strings are Latin-1; # we assume that they are in the locale's encoding if isinstance(s, str): try: s = unicode(s, IOBinding.encoding) except UnicodeError: # some other encoding; let Tcl deal with it pass Of the strings specified in the source file, only strings 2..5 decode properly as cp1250; the others don't. So these get passed directly to Tcl, which then assumes they are UTF-8, with some fallback also. The strings that look "incorrectly" are actually printed out as designed: using sys.stdout.encoding. ---------------------------------------------------------------------- Comment By: Kurt B. Kaiser (kbk) Date: 2006-07-23 07:33 Message: Logged In: YES user_id=149084 I don't have a font installed which will print those characters. When I load your sample file, I see print statements which include unicode characters like \u5341. The printed output contains the same unicode characters as the input program. Maybe Martin has an idea. ---------------------------------------------------------------------- Comment By: James (hover_boy) Date: 2006-03-22 16:21 Message: Logged In: YES user_id=1458491 I've attached an example file to demonstrate the problem better. it seems not to be the length but something else which I haven't figured out yet. I've also added the encoding comment and also tried changing the default encoding in sitecustomize.py from latin -1 to utf-8 but neither seem to work. thanks, James. XP professional, SP2, english ---------------------------------------------------------------------- Comment By: James (hover_boy) Date: 2006-03-22 16:12 Message: Logged In: YES user_id=1458491 ---------------------------------------------------------------------- Comment By: Terry J. Reedy (tjreedy) Date: 2006-03-06 02:44 Message: Logged In: YES user_id=593130 I am fairly ignorant of unicode and encodings, but I am surprised you got anything coherent without an encoding cookie comment at the top (see manual). Have you tried that? Other questions that might help someone answer: What specific XP version? SP2 installed? Country version? Your results for >>> sys.getdefaultencoding() 'ascii' >>> sys.getfilesystemencoding() 'mbcs' What happens if you reverse the order of the print statements? (Ie, is it really the shorter string that does not work or just the first?) I don't know enough to know if this is really a bug. If you don't get an answer here, you might try for more info on python-list/comp.lang.python ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1436532&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com