Looks like revision 47154 introduced a regexp that hangs Python (Ctrl-C 
won't kill the process, CPU usage sits near 100%) under some 
circumstances.  There's a test case here:

http://python.org/sf/1541697


The problem isn't seen if you read the whole file at once (or almost the 
whole file at once).  (But that doesn't make it a non-bug, AFAICS.)

I'm not sure what the problem is, but presumably the relevant part of the 
patch is this:

+starttag = re.compile(r'<[a-zA-Z][-_.:a-zA-Z0-9]*\s*('
+        r'\s*([a-zA-Z_][-:.a-zA-Z_0-9]*)(\s*=\s*'
+        r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)[EMAIL PROTECTED]'
+        r'[][\-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~\'"@]*(?=[\s>/<])))?'
+    r')*\s*/?\s*(?=[<>])')


The patch attached to bug 1515142 (also from Sam Ruby -- claims to fix a 
regression introduced by his recent sgmllib patches, and has not yet been 
applied) does NOT fix the problem.

If nobody has time to fix this, perhaps rev 47154 should be reverted?


commit message for -r47154:

"""
SF bug #1504333: sgmlib should allow angle brackets in quoted values
(modified patch by Sam Ruby; changed to use separate REs for start and end
  tags to reduce matching cost for end tags; extended tests; updated to 
avoid
  breaking previous changes to support IPv6 addresses in unquoted attribute
  values)
"""


John

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to