Bugs item #1504333, was opened at 2006-06-11 05:58
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1504333&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Sam Ruby (rubys)
Assigned to: Nobody/Anonymous (nobody)
Summary: sgmllib should allow angle brackets in quoted values

Initial Comment:
Real live example (search for "other<br />corrections")

http://latticeqcd.blogspot.com/2006/05/non-relativistic-qcd.html

This addresses the following (included in the file):

# XXX The following should skip matching quotes (' or ")


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-11 22:04

Message:
Logged In: YES 
user_id=33168
Originator: NO

You should be able to check yourself.  Use the current version of Python,
apply the test case from the original patch and your patch to the code.  If
the test passes, I'll be happy to check in the fix.  If that does work,
please create a new patch with your code and the test case from the
original patch.

----------------------------------------------------------------------

Comment By: Haejoong Lee (haepal)
Date: 2007-01-11 10:01

Message:
Logged In: YES 
user_id=135609
Originator: NO

Could someone check if the following patch fixes the problem?
This patch was made against revision 51854.

--- sgmllib.py.org      2006-11-06 02:31:12.000000000 -0500
+++ sgmllib.py  2007-01-11 12:39:30.000000000 -0500
@@ -16,6 +16,35 @@
 
 # Regular expressions used for parsing
 
+class MyMatch:
+    def __init__(self, i):
+        self._i = i
+    def start(self, i):
+        return self._i
+    
+class EndBracket:
+    def search(self, data, index):
+        s = data[index:]
+        bs = None
+        quote = None
+        for i,c in enumerate(s):
+            if bs:
+                bs = False
+            else:
+                if c == '<' or c == '>':
+                    if quote is None:
+                        break
+                elif c == "'" or c == '"':
+                    if c == quote:
+                        quote = None
+                    else:
+                        quote = c
+                elif c == '\\':
+                    bs = True
+        else:
+            return None
+        return MyMatch(i+index)
+        
 interesting = re.compile('[&<]')
 incomplete = re.compile('&([a-zA-Z][a-zA-Z0-9]*|#[0-9]*)?|'
                            '<([a-zA-Z][^<>]*|'
@@ -29,7 +58,8 @@
 shorttagopen = re.compile('<[a-zA-Z][-.a-zA-Z0-9]*/')
 shorttag = re.compile('<([a-zA-Z][-.a-zA-Z0-9]*)/([^/]*)/')
 piclose = re.compile('>')
-endbracket = re.compile('[<>]')
+#endbracket = re.compile('[<>]')
+endbracket = EndBracket()
 tagfind = re.compile('[a-zA-Z][-_.a-zA-Z0-9]*')
 attrfind = re.compile(
     r'\s*([a-zA-Z_][-:.a-zA-Z_0-9]*)(\s*=\s*'

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-09-10 21:26

Message:
Logged In: YES 
user_id=33168

I reverted the patch and added the test case for sgml so the
infinite loop doesn't recur.  This was mentioned several
times on python-dev.

Committed revision 51854. (head)
Committed revision 51850. (2.5)
Committed revision 51853. (2.4)


----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2006-06-29 10:17

Message:
Logged In: YES 
user_id=3066

I checked in a modified version of this patch: changed to
use separate REs for start and end tags to reduce matching
cost for end tags; extended tests; updated to avoid breaking
previous changes to support IPv6 addresses in unquoted
attribute values.

Committed as revisions 47154 (trunk) and 47155
(release24-maint).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1504333&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to