Le mardi 13 septembre 2005 à 17:56 +0900, Hye-Shik Chang a écrit :
> On 9/11/05, Victor STINNER <[EMAIL PROTECTED]> wrote:
> > Hi,
> > 
> > I found a bug in Python interactive command line (program python alone:
> > looks to be code.interact() function in code.py). With UTF-8 locale, the
> > command << u"é" >> returns << u'\xc3\xa9' >> and not << u'\xE9' >>.
> > Remember: the french e with acute is Unicode 233 (0xE9), encoded \xC3
> > \xA9 in UTF-8.
> 
> Which version of python do you use?  From 2.4, the interactive mode
> respects locale as a source code encoding and it falls back to latin-1
> when decoding fails.
> 
> Python 2.4.1 (#2, Jul 31 2005, 04:45:53)
> [GCC 3.4.2 [FreeBSD] 20040728] on freebsd5
> Type "help", "copyright", "credits" or "license" for more information.
> >>> u"é"
> u'\xe9'

I installed my own Python 2.4 in /opt/python/. I don't know if the right
code.py is loaded, but here is the output :
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
$ ./python2.4 
Python 2.4.1 (#1, Sep 11 2005, 01:37:26) 
[GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> u"é"
u'\xe9'
>>> import code
>>> code.interact()
Python 2.4.1 (#1, Sep 11 2005, 01:37:26) 
[GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> u"é"
u'\xc3\xa9'
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Well, that works better :-) For code.interact(), you can read my
attached patch. I don't know if it the best way to fix the but.

But, the following code still bug in Python 2.4 :
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
$ cat python_unicode_eval_bug.py 
#*- coding: UTF-8 -*-
print "One Unicode character: %u" % len(u"é")
print "One Unicode character (using eval) : %u" % eval('len(u"é")')
$ python2.4 python_unicode_eval_bug.py 
One Unicode character: 1
One Unicode character (using eval) : 2
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

RexFi explains me that Python can't guess eval('len(u"é")') charset.
Yep, that's difficult: locale? charset encoding? This test doesn't
matter.

@+, Haypo
--- /usr/lib/python2.3/code.py	2005-08-30 18:02:31.000000000 +0200
+++ code.py	2005-09-12 14:37:14.000000000 +0200
@@ -232,6 +232,7 @@
                     prompt = sys.ps1
                 try:
                     line = self.raw_input(prompt)
+                    line = unicode(line, sys.stdin.encoding)
                 except EOFError:
                     self.write("\n")
                     break
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to