[issue32876] HTMLParser raises exception on some inputs

Hanno Boeck Mon, 19 Feb 2018 11:52:32 -0800

New submission from Hanno Boeck <[email protected]>:

I noticed that the HTMLParser will raise an exception on some inputs.
I'm not sure what the expectations here are, but given that real-world HTML 
often contains all kinds of broken content I would assume an HTMLParser to 
always try to parse a document and not be interrupted by an exception if an 
error occurs.


Here's a minified example:
#!/usr/bin/env python3
import html.parser
html.parser.HTMLParser().feed("<![\n")

However I actually stepped upon HTML failing on a real webpage:
https://kafanews.com/

Exception of minified example:

Traceback (most recent call last):
  File "./foo.py", line 5, in <module>
    html.parser.HTMLParser().feed("<![\n")
  File "/usr/lib64/python3.6/html/parser.py", line 111, in feed
    self.goahead(0)
  File "/usr/lib64/python3.6/html/parser.py", line 179, in goahead
    k = self.parse_html_declaration(i)
  File "/usr/lib64/python3.6/html/parser.py", line 264, in 
parse_html_declaration
    return self.parse_marked_section(i)
  File "/usr/lib64/python3.6/_markupbase.py", line 149, in parse_marked_section
    sectName, j = self._scan_name( i+3, i )
  File "/usr/lib64/python3.6/_markupbase.py", line 391, in _scan_name
    % rawdata[declstartpos:declstartpos+20])
  File "/usr/lib64/python3.6/_markupbase.py", line 34, in error
    "subclasses of ParserBase must override error()")
NotImplementedError: subclasses of ParserBase must override error()

----------
components: Library (Lib)
messages: 312363
nosy: hanno
priority: normal
severity: normal
status: open
title: HTMLParser raises exception on some inputs
type: behavior
versions: Python 3.6

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue32876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue32876] HTMLParser raises exception on some inputs

Reply via email to