Bugs item #999444, was opened at 2004-07-28 07:00 Message generated for change (Comment added) made by nnorwitz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=999444&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Interpreter Core Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: Jim Fulton (dcjim) >Assigned to: Jeremy Hylton (jhylton) Summary: compiler module doesn't support unicode characters in laiter Initial Comment: I'm not positive that this is a bug. The buit-in compile function acepts unicode with non-ascii text in literals: >>> text = u"print u'''\u0442\u0435\u0441\u0442'''" >>> exec compile(text, 's', 'exec') теÑÑ‚ >>> import compiler >>> exec compiler.compile(text, 's', 'exec') Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/local/python/2.3.4/lib/python2.3/compiler/pycodegen.py", line 64, in compile gen.compile() File "/usr/local/python/2.3.4/lib/python2.3/compiler/pycodegen.py", line 111, in compile tree = self._get_tree() File "/usr/local/python/2.3.4/lib/python2.3/compiler/pycodegen.py", line 77, in _get_tree tree = parse(self.source, self.mode) File "/usr/local/python/2.3.4/lib/python2.3/compiler/transformer.py", line 50, in parse return Transformer().parsesuite(buf) File "/usr/local/python/2.3.4/lib/python2.3/compiler/transformer.py", line 120, in parsesuite return self.transform(parser.suite(text)) UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-13: ordinal not in range(128) >>> ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2006-02-25 14:00 Message: Logged In: YES user_id=33168 FYI ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2004-07-29 04:38 Message: Logged In: YES user_id=38388 Note that the tokenizer converts the input string into UTF-8 (transcoding it as necessary if a source code encoding shebang is found) and the compiler will assume this encoding when creating Unicode literals. I'm not sure whether the compiler package is up-to-date w/r to these internal changes in the C-based compiler. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2004-07-29 04:30 Message: Logged In: YES user_id=6656 thinking about this a little harder, doing a proper job probably invloves mucking around in the depths of python to support source-as-unicode throughout. the vile solution is this sort of thing: >>> parser.suite('# coding: utf-8\n' + u"print u'''\u0442\u0435\u0441\u0442'''".encode('utf-8')) <parser.st object at 0x107770> ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2004-07-29 04:19 Message: Logged In: YES user_id=6656 the immediate problem is that the parser module does support unicode: >>> import parser >>> parser.suite(u"print u'''\u0442\u0435\u0441\u0442'''") Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-13: ordinal not in range(128) there may well be more bugs lurking in Lib/compiler wrt this issue, but this is the first... I don't know how easy this will be to fix (looking at what the builtin compile() function does with unicode might be a good start). ---------------------------------------------------------------------- Comment By: Jim Fulton (dcjim) Date: 2004-07-28 07:02 Message: Logged In: YES user_id=73023 Also in 2.3 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=999444&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com