[issue10509] PyTokenizer_FindEncoding can lead to a segfault if bad characters are found

Andreas Stührk Mon, 22 Nov 2010 10:37:48 -0800

New submission from Andreas Stührk <andy-pyt...@hammerhartes.de>:

If a non-ascii character is found and there isn't an encoding cookie, a 
SyntaxError is raised (in `decoding_fgets`) that includes the path of the file 
(using ``tok->filename``), but that path is never set. You can easily reproduce 
the crash by calling `imp.find_module("badsyntax")`, where "badsyntax" is a 
Python file containing a non-ascii character (see e.g. the attached unit test), 
as `find_module` uses `PyTokenizer_FindEncoding`. Note that Python 3.1 uses 
`snprintf()` for formatting the error message and some implementations of 
`snprintf()` explicitly check for null pointers, hence it might not crash.


One possible fix is to set ``tok->filename`` to something like "<unknown>". 
Attached is a patch which does that and adds an unit test for imp.

----------
components: Interpreter Core
messages: 122153
nosy: Trundle
priority: normal
severity: normal
status: open
title: PyTokenizer_FindEncoding can lead to a segfault if bad characters are 
found
type: crash
versions: Python 3.1, Python 3.2

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10509>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10509] PyTokenizer_FindEncoding can lead to a segfault if bad characters are found

Reply via email to