[ python-Bugs-999444 ] compiler module doesn't support unicode characters in laiter

SourceForge.net Sat, 25 Feb 2006 14:00:48 -0800

Bugs item #999444, was opened at 2004-07-28 07:00
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=999444&group_id=5470


Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Jim Fulton (dcjim)
>Assigned to: Jeremy Hylton (jhylton)
Summary: compiler module doesn't support unicode characters in laiter

Initial Comment:
I'm not positive that this is a bug.  The buit-in
compile function acepts unicode with non-ascii text in
literals:

>>> text = u"print u'''\u0442\u0435\u0441\u0442'''"
>>> exec compile(text, 's', 'exec')
Ñ&#130;ÐµÑÑ&#130;
>>> import compiler
>>> exec compiler.compile(text, 's', 'exec')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File
"/usr/local/python/2.3.4/lib/python2.3/compiler/pycodegen.py",
line 64, in compile
    gen.compile()
  File
"/usr/local/python/2.3.4/lib/python2.3/compiler/pycodegen.py",
line 111, in compile
    tree = self._get_tree()
  File
"/usr/local/python/2.3.4/lib/python2.3/compiler/pycodegen.py",
line 77, in _get_tree
    tree = parse(self.source, self.mode)
  File
"/usr/local/python/2.3.4/lib/python2.3/compiler/transformer.py",
line 50, in parse
    return Transformer().parsesuite(buf)
  File
"/usr/local/python/2.3.4/lib/python2.3/compiler/transformer.py",
line 120, in parsesuite
    return self.transform(parser.suite(text))
UnicodeEncodeError: 'ascii' codec can't encode
characters in position 10-13: ordinal not in range(128)
>>> 

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2006-02-25 14:00

Message:
Logged In: YES 
user_id=33168

FYI

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2004-07-29 04:38

Message:
Logged In: YES 
user_id=38388

Note that the tokenizer converts the input string into UTF-8
(transcoding it as necessary if a source code encoding shebang
is found) and the compiler will assume this encoding when
creating
Unicode literals.

I'm not sure whether the compiler package is up-to-date w/r to
these internal changes in the C-based compiler.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2004-07-29 04:30

Message:
Logged In: YES 
user_id=6656

thinking about this a little harder, doing a proper job probably 
invloves mucking around in the depths of python to support 
source-as-unicode throughout.  the vile solution is this sort of 
thing:

>>> parser.suite('# coding: utf-8\n' + u"print 
u'''\u0442\u0435\u0441\u0442'''".encode('utf-8'))
<parser.st object at 0x107770>


----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2004-07-29 04:19

Message:
Logged In: YES 
user_id=6656

the immediate problem is that the parser module does support 
unicode:

>>> import parser
>>> parser.suite(u"print u'''\u0442\u0435\u0441\u0442'''")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode characters in 
position 10-13: ordinal not in range(128)

there may well be more bugs lurking in Lib/compiler wrt this 
issue, but this is the first... I don't know how easy this will be to 
fix (looking at what the builtin compile() function does with 
unicode might be a good start).

----------------------------------------------------------------------

Comment By: Jim Fulton (dcjim)
Date: 2004-07-28 07:02

Message:
Logged In: YES 
user_id=73023

Also in 2.3

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=999444&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[ python-Bugs-999444 ] compiler module doesn't support unicode characters in laiter

Reply via email to