Bugs item #1117302, was opened at 2005-02-06 15:04 Message generated for change (Comment added) made by effbot You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1117302&group_id=5470
Category: Python Library Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: Paul Birnie (pbirnie) Assigned to: Nobody/Anonymous (nobody) Summary: sgmllib.SGMLParser Initial Comment: sgmllib.SGMLParser calls start tag and end_methods correctly until it encounters <a title="link1" href="url1">One</a> <br/><a title="link2" href="someurl2">Two</a> <a title="link2" href="url3">Three</a> the <br/> seems to cause its parsing to become confused and I conly get call backs for tag a twice (link 1 and 3) ---------------------------------------------------------------------- >Comment By: Fredrik Lundh (effbot) Date: 2005-02-08 09:03 Message: Logged In: YES user_id=38376 footnote 2: if you need to deal with broken HTML, use TidyLib: http://utidylib.berlios.de/ http://effbot.org/zone/element-tidylib.htm ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2005-02-08 09:01 Message: Logged In: YES user_id=38376 footnote: <br/> is an XML construct, and is not valid HTML. In HTML, "<tag/blah/" is short for "<tag>blah</tag>", so the BR section is parsed as START br DATA ><a title="link2" href="someurl2">Two< END br DATA a> which is 100% correct. For more on this topic, see: http://www.cs.tut.fi/~jkorpela/html/empty.html ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1117302&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com