Bugs item #1504333, was opened at 2006-06-11 08:58
Message generated for change (Comment added) made by haepal
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1504333&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Sam Ruby (rubys)
Assigned to: Nobody/Anonymous (nobody)
Summary: sgmllib should allow angle brackets in quoted values

Initial Comment:
Real live example (search for "other<br />corrections")

http://latticeqcd.blogspot.com/2006/05/non-relativistic-qcd.html

This addresses the following (included in the file):

# XXX The following should skip matching quotes (' or ")


----------------------------------------------------------------------

Comment By: Haejoong Lee (haepal)
Date: 2007-01-11 13:01

Message:
Logged In: YES 
user_id=135609
Originator: NO

Could someone check if the following patch fixes the problem?
This patch was made against revision 51854.

--- sgmllib.py.org      2006-11-06 02:31:12.000000000 -0500
+++ sgmllib.py  2007-01-11 12:39:30.000000000 -0500
@@ -16,6 +16,35 @@
 
 # Regular expressions used for parsing
 
+class MyMatch:
+    def __init__(self, i):
+        self._i = i
+    def start(self, i):
+        return self._i
+    
+class EndBracket:
+    def search(self, data, index):
+        s = data[index:]
+        bs = None
+        quote = None
+        for i,c in enumerate(s):
+            if bs:
+                bs = False
+            else:
+                if c == '<' or c == '>':
+                    if quote is None:
+                        break
+                elif c == "'" or c == '"':
+                    if c == quote:
+                        quote = None
+                    else:
+                        quote = c
+                elif c == '\\':
+                    bs = True
+        else:
+            return None
+        return MyMatch(i+index)
+        
 interesting = re.compile('[&<]')
 incomplete = re.compile('&([a-zA-Z][a-zA-Z0-9]*|#[0-9]*)?|'
                            '<([a-zA-Z][^<>]*|'
@@ -29,7 +58,8 @@
 shorttagopen = re.compile('<[a-zA-Z][-.a-zA-Z0-9]*/')
 shorttag = re.compile('<([a-zA-Z][-.a-zA-Z0-9]*)/([^/]*)/')
 piclose = re.compile('>')
-endbracket = re.compile('[<>]')
+#endbracket = re.compile('[<>]')
+endbracket = EndBracket()
 tagfind = re.compile('[a-zA-Z][-_.a-zA-Z0-9]*')
 attrfind = re.compile(
     r'\s*([a-zA-Z_][-:.a-zA-Z_0-9]*)(\s*=\s*'

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-09-11 00:26

Message:
Logged In: YES 
user_id=33168

I reverted the patch and added the test case for sgml so the
infinite loop doesn't recur.  This was mentioned several
times on python-dev.

Committed revision 51854. (head)
Committed revision 51850. (2.5)
Committed revision 51853. (2.4)


----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2006-06-29 13:17

Message:
Logged In: YES 
user_id=3066

I checked in a modified version of this patch: changed to
use separate REs for start and end tags to reduce matching
cost for end tags; extended tests; updated to avoid breaking
previous changes to support IPv6 addresses in unquoted
attribute values.

Committed as revisions 47154 (trunk) and 47155
(release24-maint).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1504333&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to