[issue32876] HTMLParser raises exception on some inputs

2022-01-14 Thread Irit Katriel


Irit Katriel  added the comment:

Reopening to discuss what the correct behaviour should be.

--
resolution: out of date -> 
status: closed -> open
versions: +Python 3.11 -Python 2.7, Python 3.6, Python 3.7, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32876] HTMLParser raises exception on some inputs

2022-01-14 Thread Hanno Boeck


Hanno Boeck  added the comment:

Now the example code raises an AssertionError(). Is that intended? I don't 
think that's any better.

I usually wouldn't expect an HTML parser to raise any error if you pass it a 
string, but instead to do fault tolerant parsing. And if it's expected that 
some inputs can generate exceptions, at least I think this should be properly 
documented.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32876] HTMLParser raises exception on some inputs

2022-01-14 Thread Irit Katriel


Irit Katriel  added the comment:

The error() method was removed in issue31844.

--
resolution:  -> out of date
stage: patch review -> resolved
status: open -> closed
superseder:  -> HTMLParser: undocumented not implemented method

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32876] HTMLParser raises exception on some inputs

2018-09-14 Thread Ezio Melotti


Change by Ezio Melotti :


--
keywords: +patch
pull_requests: +8724
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32876] HTMLParser raises exception on some inputs

2018-08-23 Thread Berker Peksag


Berker Peksag  added the comment:

Issue 34480 is another relevant issue. The HTMLParse method doesn't have an 
error() method and it doesn't raise any exceptions, but its base class still 
does. I think there is a compatibility problem between html.parser.HTMLParser() 
and _markupbase.ParserBase() classes. See https://bugs.python.org/msg323966 for 
more details about this.

--
nosy: +berker.peksag

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32876] HTMLParser raises exception on some inputs

2018-02-25 Thread Ezio Melotti

Change by Ezio Melotti :


--
assignee:  -> ezio.melotti

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32876] HTMLParser raises exception on some inputs

2018-02-19 Thread Ezio Melotti

Ezio Melotti  added the comment:

The HTMLParser has been updated to handle HTML5 and should never fail parsing a 
document, so if it raises an error it's probably a bug.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32876] HTMLParser raises exception on some inputs

2018-02-19 Thread Hanno Boeck

Hanno Boeck  added the comment:

Actually BeautifulSoup also uses the python html parser in the backend, so it 
has the same problem. (It can use alternative backends, but the python parser 
is the default and they also describe it as "lenient", which I would interpret 
as "it can handle that".)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32876] HTMLParser raises exception on some inputs

2018-02-19 Thread Steven D'Aprano

Steven D'Aprano  added the comment:

The stdlib HTML parser requires correct HTML.

To parse broken HTML, as you find in the real world, you need a third-party 
library like BeautifulSoup. BeautifulSoup is much more complex (about 7-8 times 
as many LOC) but can handle nearly anything a browser can.

I doubt the stdlib will ever compete with BeautifulSoup.

--
nosy: +steven.daprano

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32876] HTMLParser raises exception on some inputs

2018-02-19 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
nosy: +ezio.melotti

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com