Ezio Melotti added the comment:
This is now fixed, thanks for the report!
This should be fixed, and the behavior of _run_check should probably be
changed too -- maybe it could test both the char-by-char and the
regular feeding.
I created #20623 to track this.
--
resolution: -
Ezio Melotti added the comment:
Here's a patch against 2.7.
--
keywords: +patch
Added file: http://bugs.python.org/file33845/issue20288.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20288
Roundup Robot added the comment:
New changeset 0d50b5851f38 by Ezio Melotti in branch '2.7':
#20288: fix handling of invalid numeric charrefs in HTMLParser.
http://hg.python.org/cpython/rev/0d50b5851f38
New changeset 32097f193892 by Ezio Melotti in branch '3.3':
#20288: fix handling of invalid
New submission from Anders Hammarquist:
Python 2.7 HTMLParse.py lines 185-199 (similar lines still exist in Python 3.4)
match = charref.match(rawdata, i)
if match:
...
else:
if ; in rawdata[i:]: #bail by
Changes by Ezio Melotti ezio.melo...@gmail.com:
--
assignee: - ezio.melotti
nosy: +ezio.melotti
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20288
___
Ezio Melotti added the comment:
Thanks for the report, this is indeed a bug.
This behavior was covered by a test (see Lib/test/test_htmlparser.py:164), but
_run_check feeds the chars one by one to the parser, and in that case it works
correctly. While feeding the parser a whole chunk I was