[issue25258] HtmlParser doesn't handle void element tags correctly

Chenyun Yang Mon, 28 Sep 2015 12:27:10 -0700

New submission from Chenyun Yang:

For void elements such as (<link>, <img>), there doesn't need to have xhtml 
empty end tag. HtmlParser which relies on the XHTML empty end syntax failed to 
handle this situation.


from HTMLParser import HTMLParser

# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print "Encountered a start tag:", tag
    def handle_endtag(self, tag):
        print "Encountered an end tag :", tag
    def handle_data(self, data):
        print "Encountered some data  :", data

>>> parser.feed('<link rel="import"><img src="som">')
Encountered a start tag: link
Encountered a start tag: img
>>> parser.feed('<link rel="import"/><img src="som"/>')
Encountered a start tag: link
Encountered an end tag : link
Encountered a start tag: img
Encountered an end tag : img


Reference:
https://github.com/python/cpython/blob/bdfb14c688b873567d179881fc5bb67363a6074c/Lib/html/parser.py
http://www.w3.org/TR/html5/syntax.html#void-elements

----------
components: Library (Lib)
messages: 251792
nosy: Chenyun Yang
priority: normal
severity: normal
status: open
title: HtmlParser doesn't handle void element tags correctly
versions: Python 2.7

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue25258>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25258] HtmlParser doesn't handle void element tags correctly

Reply via email to