Bugs item #1436532, was opened at 2006-02-22 10:45
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1436532&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IDLE
Group: Python 2.4
>Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: James (hover_boy)
Assigned to: Martin v. Löwis (loewis)
Summary: length of unicode string changes print behaviour

Initial Comment:
Python 2.4.2 and IDLE (with Courier New font) on XP 
and the following code saved as a UTF-8 file 

if __name__ == "__main__": 
    print "零 一 二 三 四 五 六 七 八" 
    print "零 一 二 三 四 五 六 七 八 九 十 "

results in...

IDLE 1.1.2 
>>> ================================ RESTART 
================================ 
>>> 
é›¶ 一 二 三 å›› 五 å…七 å…« 
零 一 二 三 四 五 六 七 八 九 十 
>>> 





----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2006-07-23 21:42

Message:
Logged In: YES 
user_id=21627

This is not a bug. The program should not attempt to print
byte strings, since it cannot know what the encoding of the
byte strings is. Instead, the program should use Unicode
strings, such as

    print u"八八八八八八八八八八八八八八八八八八八八八八"

If you attempt to print byte strings, they have to be in the
encoding of stdout, or else the behaviour is unspecified.

In my installation/locale, sys.stdout.encoding is cp1250.
IDLE's OutputWindow.write has this code:

        # Tk assumes that byte strings are Latin-1;
        # we assume that they are in the locale's encoding
        if isinstance(s, str):
            try:
                s = unicode(s, IOBinding.encoding)
            except UnicodeError:
                # some other encoding; let Tcl deal with it
                pass

Of the strings specified in the source file, only strings
2..5 decode properly as cp1250; the others don't. So these
get passed directly to Tcl, which then assumes they are
UTF-8, with some fallback also. The strings that look
"incorrectly" are actually printed out as designed: using
sys.stdout.encoding.


----------------------------------------------------------------------

Comment By: Kurt B. Kaiser (kbk)
Date: 2006-07-23 07:33

Message:
Logged In: YES 
user_id=149084

I don't have a font installed which will print
those characters.  When I load your sample file,
I see print statements which include unicode
characters like \u5341.  The printed output
contains the same unicode characters as the
input program.  Maybe Martin has an idea.

----------------------------------------------------------------------

Comment By: James (hover_boy)
Date: 2006-03-22 16:21

Message:
Logged In: YES 
user_id=1458491

I've attached an example file to demonstrate the problem 
better.

it seems not to be the length but something else which I 
haven't figured out yet.

I've also added the encoding comment and also tried 
changing the default encoding in sitecustomize.py from latin
-1 to utf-8 but neither seem to work.

thanks,

James.

XP professional, SP2, english


----------------------------------------------------------------------

Comment By: James (hover_boy)
Date: 2006-03-22 16:12

Message:
Logged In: YES 
user_id=1458491




----------------------------------------------------------------------

Comment By: Terry J. Reedy (tjreedy)
Date: 2006-03-06 02:44

Message:
Logged In: YES 
user_id=593130

I am fairly ignorant of unicode and encodings, but I am 
surprised you got anything coherent without an encoding 
cookie comment at the top (see manual).  Have you tried 
that?  Other questions that might help someone answer:

What specific XP version?  SP2 installed? Country version?
Your results for
>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'mbcs'
What happens if you reverse the order of the print 
statements?  (Ie, is it really the shorter string that 
does not work or just the first?)

I don't know enough to know if this is really a bug.  If 
you don't get an answer here, you might try for more info 
on python-list/comp.lang.python

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1436532&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to