Ezio Melotti <ezio.melo...@gmail.com> added the comment:
There are at least a couple of issues here. The first one is the way the parser handles '<![...'. The linked page contains markup like '<![STAT]-[USER-ACTIVE]!>' and since the parser currently checks for '<![' only, _markupbase.py:parse_marked_section gets called and an error gets incorrectly raised. However "8.2.4.42. Markup declaration open state"[0], states that after consuming '<!', there are only 4 valid paths forward: 1) if we have '<!--', it's a comment; 2) if we have '<!doctype', it's a doctype declaration; 3) if we have '<![CDATA[', it's a CDATA section; 4) if it's something else, it's a bogus comment; The above example should therefore fall into 4), and be treated like a bogus comment. PR-9295 changes parse_html_declaration() to align to the specs and implement path 3), resulting in the webpage being parsed without errors (the invalid markup is considered as a bogus comment). The second issue is about an EOF in the middle of a bogus markup declaration, like in the minified example provided by OP ("<![\n"). In this case the comment should still be emitted ('[\n'), but currently nothing gets emitted. I'll look more into it either tomorrow or later this month and update the PR accordingly (or perhaps I'll open a separate issue). [0]: https://www.w3.org/TR/html52/syntax.html#tokenizer-markup-declaration-open-state ---------- versions: +Python 2.7, Python 3.7, Python 3.8 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue32876> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com