[issue20288] HTMLParse handing of non-numeric charrefs broken

2014-02-13 Thread Ezio Melotti

Ezio Melotti added the comment:

This is now fixed, thanks for the report!

 This should be fixed, and the behavior of _run_check should probably be
 changed too -- maybe it could test both the char-by-char and the
 regular feeding.

I created #20623 to track this.

--
resolution:  - fixed
stage: needs patch - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20288
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20288] HTMLParse handing of non-numeric charrefs broken

2014-02-01 Thread Ezio Melotti

Ezio Melotti added the comment:

Here's a patch against 2.7.

--
keywords: +patch
Added file: http://bugs.python.org/file33845/issue20288.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20288
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20288] HTMLParse handing of non-numeric charrefs broken

2014-02-01 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 0d50b5851f38 by Ezio Melotti in branch '2.7':
#20288: fix handling of invalid numeric charrefs in HTMLParser.
http://hg.python.org/cpython/rev/0d50b5851f38

New changeset 32097f193892 by Ezio Melotti in branch '3.3':
#20288: fix handling of invalid numeric charrefs in HTMLParser.
http://hg.python.org/cpython/rev/32097f193892

New changeset 92b3928bfde1 by Ezio Melotti in branch 'default':
#20288: merge with 3.3.
http://hg.python.org/cpython/rev/92b3928bfde1

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20288
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20288] HTMLParse handing of non-numeric charrefs broken

2014-01-17 Thread Anders Hammarquist

New submission from Anders Hammarquist:

Python 2.7 HTMLParse.py lines 185-199 (similar lines still exist in Python 3.4)
match = charref.match(rawdata, i)
if match:
...
else:
if ; in rawdata[i:]: #bail by consuming #
self.handle_data(rawdata[0:2])
i = self.updatepos(i, 2)
break

if you feed a broken charref, that is non-numeric, it will pass whatever random 
string that happened to be at the start of rawdata to handle_data(). Eg:

p = HTMLParser()
p.handle_data = lambda x: sys.stdout.write(x)
p.feed('p#foo;/p')

will print 'p' which is clearly wrong. I think the intention of the code is to 
pass '#', which seems saner.

--
components: Library (Lib)
messages: 208336
nosy: iko
priority: normal
severity: normal
status: open
title: HTMLParse handing of non-numeric charrefs broken
type: behavior

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20288
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20288] HTMLParse handing of non-numeric charrefs broken

2014-01-17 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
assignee:  - ezio.melotti
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20288
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20288] HTMLParse handing of non-numeric charrefs broken

2014-01-17 Thread Ezio Melotti

Ezio Melotti added the comment:

Thanks for the report, this is indeed a bug.
This behavior was covered by a test (see Lib/test/test_htmlparser.py:164), but 
_run_check feeds the chars one by one to the parser, and in that case it works 
correctly.  While feeding the parser a whole chunk I was able to reproduce the 
bug.  This should be fixed, and the behavior of _run_check should probably be 
changed too -- maybe it could test both the char-by-char and the regular 
feeding.

--
nosy: +r.david.murray
stage:  - needs patch
versions: +Python 2.7, Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20288
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com