New submission from Ademar Nowasky Junior <nowasky...@gmail.com>:

HTML tags that have a attribute name starting with a comma character aren't 
parsed and break future calls to feed(). 

The problem occurs when such attribute is the second one or later in the HTML 
tag. Doesn't seems to affect when it's the first attribute.

#POC:

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print("Encountered a start tag:", tag)

parser = MyHTMLParser()

#This is ok
parser.feed('<yyy id="poc" a,="">')

#This breaks
parser.feed('<zzz id="poc" ,a="">')

#Future calls to feed() will not work
parser.feed('<img id="poc" src=x>')

----------
components: Library (Lib)
messages: 376607
nosy: nowasky.jr
priority: normal
severity: normal
status: open
title: HTMLParser: parsing error
type: crash
versions: Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41748>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to