Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

Ron Adam Thu, 25 Nov 2010 09:25:07 -0800


On 11/25/2010 08:30 AM, Emile Anclin wrote:


hello,

working on Pylint, we have a lot of voluntary corrupted files to test
Pylint behavior; for instance

$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py
# -*- coding: IBO-8859-1 -*-
""" check correct unknown encoding declaration
"""

__revision__ = 'éééé'


and we try to find that module :
find_module('func_unknown_encoding', None). But python3 raises SyntaxError
in that case ; it didn't raise SyntaxError on python2 nor does so on our
func_nonascii_noencoding and func_wrong_encoding modules (with obvious
names)

Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36)
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.

from imp import find_module
find_module('func_unknown_encoding', None)

Traceback (most recent call last):
   File "<stdin>", line 1, in<module>
SyntaxError: encoding problem: with BOM

find_module('func_wrong_encoding', None)

(<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py',
('.py', 'U', 1))

find_module('func_nonascii_noencoding', None)

(<_io.TextIOWrapper name=6 encoding='utf-8'>,
'func_nonascii_noencoding.py', ('.py', 'U', 1))


So what is the reason of this selective behavior?
Furthermore, there is BOM in our func_unknown_encoding.py module.

I don't think there is a clear reason by design. Also try importing thesame modules directly and noting the differences in the errors you get.


For example, the problem that brought this to my attention in python3.2.

>>> find_module('test/badsyntax_pep3120')
Segmentation fault

>>> from test import badsyntax_pep3120
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.2/test/badsyntax_pep3120.py", line 1

SyntaxError: Non-UTF-8 code starting with '\xf6' in file/usr/local/lib/python3.2/test/badsyntax_pep3120.py on line 1, but noencoding declared; see http://python.org/dev/peps/pep-0263/ for details

The import statement uses parser.c, and tokenizer.c indirectly, to import afile, but the imp module uses tokenizer.c directly. They aren't consistentin how they handle errors because the different error messages aregenerated in different places depending on what the error is, *and* whatthe code path to get to that point was, *and* weather or not a filename wasset. For the example above with imp.findmodule(), the filename isn't set,so you get a different error than if you used import, which uses the parsermodule and that does set the filename.

From what I've seen, it would help if the imp module was rewritten to useparser.c like the import statement does, rather than tokenizer.c directly.The error handling in parser.c is much better than tokenizer.c. Possiblytokenizer.c could be cleaned up after that and be made much simpler.


Ron Adam














_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

Reply via email to