New submission from Chenyun Yang:
For void elements such as (<link>, <img>), there doesn't need to have xhtml
empty end tag. HtmlParser which relies on the XHTML empty end syntax failed to
handle this situation.
from HTMLParser import HTMLParser
# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print "Encountered a start tag:", tag
def handle_endtag(self, tag):
print "Encountered an end tag :", tag
def handle_data(self, data):
print "Encountered some data :", data
>>> parser.feed('<link rel="import"><img src="som">')
Encountered a start tag: link
Encountered a start tag: img
>>> parser.feed('<link rel="import"/><img src="som"/>')
Encountered a start tag: link
Encountered an end tag : link
Encountered a start tag: img
Encountered an end tag : img
Reference:
https://github.com/python/cpython/blob/bdfb14c688b873567d179881fc5bb67363a6074c/Lib/html/parser.py
http://www.w3.org/TR/html5/syntax.html#void-elements
----------
components: Library (Lib)
messages: 251792
nosy: Chenyun Yang
priority: normal
severity: normal
status: open
title: HtmlParser doesn't handle void element tags correctly
versions: Python 2.7
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue25258>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com