Le mardi 13 septembre 2005 à 17:56 +0900, Hye-Shik Chang a écrit : > On 9/11/05, Victor STINNER <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I found a bug in Python interactive command line (program python alone: > > looks to be code.interact() function in code.py). With UTF-8 locale, the > > command << u"é" >> returns << u'\xc3\xa9' >> and not << u'\xE9' >>. > > Remember: the french e with acute is Unicode 233 (0xE9), encoded \xC3 > > \xA9 in UTF-8. > > Which version of python do you use? From 2.4, the interactive mode > respects locale as a source code encoding and it falls back to latin-1 > when decoding fails. > > Python 2.4.1 (#2, Jul 31 2005, 04:45:53) > [GCC 3.4.2 [FreeBSD] 20040728] on freebsd5 > Type "help", "copyright", "credits" or "license" for more information. > >>> u"é" > u'\xe9'
I installed my own Python 2.4 in /opt/python/. I don't know if the right code.py is loaded, but here is the output : =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- $ ./python2.4 Python 2.4.1 (#1, Sep 11 2005, 01:37:26) [GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> u"é" u'\xe9' >>> import code >>> code.interact() Python 2.4.1 (#1, Sep 11 2005, 01:37:26) [GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2 Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>> u"é" u'\xc3\xa9' =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Well, that works better :-) For code.interact(), you can read my attached patch. I don't know if it the best way to fix the but. But, the following code still bug in Python 2.4 : =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- $ cat python_unicode_eval_bug.py #*- coding: UTF-8 -*- print "One Unicode character: %u" % len(u"é") print "One Unicode character (using eval) : %u" % eval('len(u"é")') $ python2.4 python_unicode_eval_bug.py One Unicode character: 1 One Unicode character (using eval) : 2 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- RexFi explains me that Python can't guess eval('len(u"é")') charset. Yep, that's difficult: locale? charset encoding? This test doesn't matter. @+, Haypo
--- /usr/lib/python2.3/code.py 2005-08-30 18:02:31.000000000 +0200 +++ code.py 2005-09-12 14:37:14.000000000 +0200 @@ -232,6 +232,7 @@ prompt = sys.ps1 try: line = self.raw_input(prompt) + line = unicode(line, sys.stdin.encoding) except EOFError: self.write("\n") break
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com