[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-05-13 Thread Ezio Melotti
Ezio Melotti added the comment: What I described in my previous message is what Firefox does. If you think this should be changed, I suggest you to open another issue, possibly attaching a test case with the desired behavior and a patch to change it. -- _

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-04-21 Thread Paweł Widera
Paweł Widera added the comment: No. As the value of the href attribute is not suppose to contain spaces, I'd rather expect the parser to assume that there is an ending " missing before the space. -- ___ Python tracker

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-04-14 Thread Ezio Melotti
Ezio Melotti added the comment: So you are suggesting that http://xxx.org/xxx.php?a=1 target="_blank">click me should result in an 'a' element with an href attribute equals to "http://xxx.org/xxx.php?a=1 target=" and then discard _blank" as extra data? -- __

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-04-14 Thread Paweł Widera
Paweł Widera added the comment: Great! With one "but"... the second case *is* handled by browsers. Browsers do not throw an exception on it as HTMLParser do. So improvement is definitely possible here. If it is worth an effort, it is not for me to judge. -- __

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-04-13 Thread Ezio Melotti
Ezio Melotti added the comment: The first case has been fixed already in 1cbfeffea19f, the second case is not even handled by browsers, so I'm closing this. -- resolution: -> fixed stage: -> committed/rejected status: open -> closed ___ Python tra

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-04-05 Thread Ezio Melotti
Changes by Ezio Melotti : -- versions: +Python 3.2, Python 3.3 -Python 2.6 ___ Python tracker ___ ___ Python-bugs-list mailing list Uns

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-06 Thread Ezio Melotti
Ezio Melotti added the comment: BeautifulSoup use SGMLParser for all the versions <3.1. BeautifulSoup 3.1 is supposed to be compatible with Python 3 and since SGMLParser is gone it's now using HTMLParser, but it's not able to handle some things anymore. For more information: http://www.crummy.

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread Georg Brandl
Georg Brandl added the comment: So BeautifulSoup is using HTMLParser? That is interesting, because they claim to support "broken" HTML. In any case, if a "quirky" mode is added, it should have to be turned on explicitly by a flag. -- resolution: wont fix -> __

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread R. David Murray
R. David Murray added the comment: In doing web scraping I started using BeautifulSoup precisely because it was very lenient in what html it accepted (I haven't written such an ap for a while, so I'm not sure what BeautifulSoup currently does...I thought I heard it was now using HTMLParser...).

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread Georg Brandl
Georg Brandl added the comment: > Throwing an exception and giving up is just not good enough. Yes it is, in some cases. There are "forgiving" HTML parsers out there, HTMLParser does not strive to be one. There are *so many* cases where HTML is a bit malformed that it takes more than just two

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread Paweł Widera
Paweł Widera added the comment: It depends whether you want a HTMLParser to be an useful tool that can deal with real world HTML or just a toy without practical meaning. Crashing on every little deviation from the standard, where more relaxed approach is possible, doesn't sound to me as a reason

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread Georg Brandl
Georg Brandl added the comment: I do not think HTMLParser should guess. Guessing always opens the door to misinterpretation. -- nosy: +georg.brandl resolution: -> wont fix status: open -> closed ___ Python tracker

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread Paweł Widera
New submission from Paweł Widera : Of course both are not correct HTML but are easy to guess, so I believe the parser should not give up too quick here. 1) extra comma between attributes 2) missing closing quotation mark for the first attribute http://xxx.org/xxx.php?a=1 target="_blank">click