[issue10759] HTMLParser.unescape() fails on HTML entities with incorrect syntax (e.g. #hearts; )

2010-12-28 Thread Senthil Kumaran
Senthil Kumaran orsent...@gmail.com added the comment: Fixed this in r87542 in (py3k). unescape is undocumented helper method, so no docs are added. There was already an issue ( Issue6662) on malformed charref handling and it is fixed. -- resolution: - fixed stage: patch review -

[issue10759] HTMLParser.unescape() fails on HTML entities with incorrect syntax (e.g. #hearts; )

2010-12-28 Thread Senthil Kumaran
Senthil Kumaran orsent...@gmail.com added the comment: r87544 (release27-maint) and r87545 (release31-maint). -- status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10759 ___

[issue10759] HTMLParser.unescape() fails on HTML entities with incorrect syntax (e.g. #hearts; )

2010-12-23 Thread Senthil Kumaran
Senthil Kumaran orsent...@gmail.com added the comment: Yes, I too agree that HTMLParser.unescape() should split-out malformed char-ref just as other browsers do. But, as unescape function has undocumented/unexposed for releases, I am not sure making it exposed is a good idea. HTMLParser is

[issue10759] HTMLParser.unescape() fails on HTML entities with incorrect syntax (e.g. #hearts; )

2010-12-22 Thread Martin Potthast
Changes by Martin Potthast martin.potth...@googlemail.com: -- title: HTMLParser.unescape() cannot handle HTML entities with incorrect syntax (e.g. #hearts;) - HTMLParser.unescape() fails on HTML entities with incorrect syntax (e.g. #hearts;) ___

[issue10759] HTMLParser.unescape() fails on HTML entities with incorrect syntax (e.g. #hearts; )

2010-12-22 Thread Martin Potthast
Martin Potthast martin.potth...@googlemail.com added the comment: I'd suggest to better verify the input and return such strings unchanged. -- type: - behavior ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10759

[issue10759] HTMLParser.unescape() fails on HTML entities with incorrect syntax (e.g. #hearts; )

2010-12-22 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: Leaving the input unchanged does seem to be what browsers do. (Issue 7626 has some info on browser behaviour with invalid entity refs.) Rather than pre-validating the input, I think the exception can be caught and the putative entity

[issue10759] HTMLParser.unescape() fails on HTML entities with incorrect syntax (e.g. #hearts; )

2010-12-22 Thread Martin Potthast
Martin Potthast martin.potth...@googlemail.com added the comment: Agreed. Here's a patch for HTMLParser. That was easy enough. With regard to tests, there seems to be already one called test_malformatted_charref in test_htmlparser.py. However, the test tests the whole parser and not only

[issue10759] HTMLParser.unescape() fails on HTML entities with incorrect syntax (e.g. #hearts; )

2010-12-22 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: Ah, as an undocumented internal interface it may in fact not be appropriate to make this change. Or it may be. I'll have to look at the code in more detail to figure that out, or perhaps Senthil will. (It may even be time to document

[issue10759] HTMLParser.unescape() fails on HTML entities with incorrect syntax (e.g. #hearts; )

2010-12-22 Thread Martin Potthast
Martin Potthast martin.potth...@googlemail.com added the comment: Why not simply remove the additional check in line 168 and leave the responsibility to check the validity of its input to the unescape function (be it explicitly or, like now, lazily). That way, the code changes are minimal,