New submission from Mark Nottingham <m...@mnot.net>:

In markupbase.py's ParserBase.parse_declaration, an unexpected character is 
caught like this:

            else:
                self.error(
                    "unexpected %r char in declaration" % rawdata[j])

However, the position (j) isn't updated, which means that error() will be 
called again once it returns.

For example, this declaration:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
http://www.w3.org/TR/html4/loose.dtd>

(which I think is generated by MS Office) will trigger this behaviour.

Two possible resolutions:

1) increment J and try the next character in this case

2) document that error() is not recoverable; i.e., it MUST raise an exception.

My preference is strongly for #1 (as HTML parsing should be forgiving, and 
HTMLParser is based upon markerbase).

----------
components: Library (Lib)
messages: 106938
nosy: mnot
priority: normal
severity: normal
status: open
title: markerbase declaration errors aren't recoverable
type: behavior
versions: Python 2.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8885>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to