Bugs item #1504333, was opened at 2006-06-11 05:58 Message generated for change (Comment added) made by nnorwitz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1504333&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.4 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Sam Ruby (rubys) Assigned to: Nobody/Anonymous (nobody) Summary: sgmllib should allow angle brackets in quoted values Initial Comment: Real live example (search for "other<br />corrections") http://latticeqcd.blogspot.com/2006/05/non-relativistic-qcd.html This addresses the following (included in the file): # XXX The following should skip matching quotes (' or ") ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-11 22:04 Message: Logged In: YES user_id=33168 Originator: NO You should be able to check yourself. Use the current version of Python, apply the test case from the original patch and your patch to the code. If the test passes, I'll be happy to check in the fix. If that does work, please create a new patch with your code and the test case from the original patch. ---------------------------------------------------------------------- Comment By: Haejoong Lee (haepal) Date: 2007-01-11 10:01 Message: Logged In: YES user_id=135609 Originator: NO Could someone check if the following patch fixes the problem? This patch was made against revision 51854. --- sgmllib.py.org 2006-11-06 02:31:12.000000000 -0500 +++ sgmllib.py 2007-01-11 12:39:30.000000000 -0500 @@ -16,6 +16,35 @@ # Regular expressions used for parsing +class MyMatch: + def __init__(self, i): + self._i = i + def start(self, i): + return self._i + +class EndBracket: + def search(self, data, index): + s = data[index:] + bs = None + quote = None + for i,c in enumerate(s): + if bs: + bs = False + else: + if c == '<' or c == '>': + if quote is None: + break + elif c == "'" or c == '"': + if c == quote: + quote = None + else: + quote = c + elif c == '\\': + bs = True + else: + return None + return MyMatch(i+index) + interesting = re.compile('[&<]') incomplete = re.compile('&([a-zA-Z][a-zA-Z0-9]*|#[0-9]*)?|' '<([a-zA-Z][^<>]*|' @@ -29,7 +58,8 @@ shorttagopen = re.compile('<[a-zA-Z][-.a-zA-Z0-9]*/') shorttag = re.compile('<([a-zA-Z][-.a-zA-Z0-9]*)/([^/]*)/') piclose = re.compile('>') -endbracket = re.compile('[<>]') +#endbracket = re.compile('[<>]') +endbracket = EndBracket() tagfind = re.compile('[a-zA-Z][-_.a-zA-Z0-9]*') attrfind = re.compile( r'\s*([a-zA-Z_][-:.a-zA-Z_0-9]*)(\s*=\s*' ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-09-10 21:26 Message: Logged In: YES user_id=33168 I reverted the patch and added the test case for sgml so the infinite loop doesn't recur. This was mentioned several times on python-dev. Committed revision 51854. (head) Committed revision 51850. (2.5) Committed revision 51853. (2.4) ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2006-06-29 10:17 Message: Logged In: YES user_id=3066 I checked in a modified version of this patch: changed to use separate REs for start and end tags to reduce matching cost for end tags; extended tests; updated to avoid breaking previous changes to support IPv6 addresses in unquoted attribute values. Committed as revisions 47154 (trunk) and 47155 (release24-maint). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1504333&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com