Patches item #1486713, was opened at 2006-05-11 18:19 Message generated for change (Comment added) made by jjlee You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1486713&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.4 Status: Open Resolution: None Priority: 5 Private: No Submitted By: kxroberto (kxroberto) Assigned to: Nobody/Anonymous (nobody) Summary: HTMLParser : A auto-tolerant parsing mode Initial Comment: Changes: * Now allows missing spaces between attributes as its often seen on the web like this : <script type="text/javascript"language="JavaScript1.1"> That like broke the whole parsing before. * A fully auto-tolerant mode (HTMLParser.tolerant=1) was added. It should hopefully NEVER break HTML parsing on the level of HTMLParser, but recover and continue the parsing smartly. The mode was tested extensively with complex pages. The tolerant mode is guaranted to finish all HTML stuff only during HTMLParser.close() / goahead(end=True) - yet that was the same (stucking) policy before. Maybe steep: I have switched ON the tolerant mode by default, as this is, what in 99.9% of cases one wants to have. (I've maybe 20 applications for HTMLParser - None like the unrecoverable breaks with Exceptions) During tolerant mode the virtual .warning(message,i,k) is called instead of error - by default this just counts .warning_count up. This framework should even enable to write po HTML checkers * The patch was generated against py2.3 (still the "good/base" Python for me) and also fixes a regexp-bug (which already was fixed in py2.4.2). Yet the patch works also against py2.4/2.5 - 2 locations where py24 trivially changed to %r/repr may grumble. -robert ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2007-01-30 02:32 Message: Logged In: YES user_id=261020 Originator: NO This badly needs unit tests. ---------------------------------------------------------------------- Comment By: kxroberto (kxroberto) Date: 2006-05-23 16:15 Message: Logged In: YES user_id=972995 (and works also for Python2.5) ---------------------------------------------------------------------- Comment By: kxroberto (kxroberto) Date: 2006-05-23 16:11 Message: Logged In: YES user_id=972995 Python 2.4 version of the patch added. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1486713&group_id=5470 _______________________________________________ Patches mailing list Patches@python.org http://mail.python.org/mailman/listinfo/patches