[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

Éric Araujo Wed, 27 Jul 2011 08:17:00 -0700

Éric Araujo <[email protected]> added the comment:

Ezio wrote:
  >>> myhp.feed('<script><p>foo</p></script>')
  data: '<p>foo'  # where's the </p>?


http://www.w3.org/TR/html4/types#type-cdata says:
  Although the STYLE and SCRIPT elements use CDATA for their data
  model, for these elements, CDATA must be handled differently by user
  agents. Markup and entities must be treated as raw text and passed to
  the application as is. The first occurrence of the character sequence
  "</" (end-tag open delimiter) is treated as terminating the end of
  the element's content. In valid documents, this would be the end tag
  for the element.

So I think the example is invalid (should escape the <), and that HTMLParser is 
not buggy.

----------
versions: +Python 3.3 -Python 3.1

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue670664>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

Reply via email to