New submission from Mark Nottingham <[email protected]>:
In markupbase.py's ParserBase.parse_declaration, an unexpected character is
caught like this:
else:
self.error(
"unexpected %r char in declaration" % rawdata[j])
However, the position (j) isn't updated, which means that error() will be
called again once it returns.
For example, this declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
http://www.w3.org/TR/html4/loose.dtd>
(which I think is generated by MS Office) will trigger this behaviour.
Two possible resolutions:
1) increment J and try the next character in this case
2) document that error() is not recoverable; i.e., it MUST raise an exception.
My preference is strongly for #1 (as HTML parsing should be forgiving, and
HTMLParser is based upon markerbase).
----------
components: Library (Lib)
messages: 106938
nosy: mnot
priority: normal
severity: normal
status: open
title: markerbase declaration errors aren't recoverable
type: behavior
versions: Python 2.6
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue8885>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com