Looks like revision 47154 introduced a regexp that hangs Python (Ctrl-C won't kill the process, CPU usage sits near 100%) under some circumstances. There's a test case here:
http://python.org/sf/1541697 The problem isn't seen if you read the whole file at once (or almost the whole file at once). (But that doesn't make it a non-bug, AFAICS.) I'm not sure what the problem is, but presumably the relevant part of the patch is this: +starttag = re.compile(r'<[a-zA-Z][-_.:a-zA-Z0-9]*\s*(' + r'\s*([a-zA-Z_][-:.a-zA-Z_0-9]*)(\s*=\s*' + r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)[EMAIL PROTECTED]' + r'[][\-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~\'"@]*(?=[\s>/<])))?' + r')*\s*/?\s*(?=[<>])') The patch attached to bug 1515142 (also from Sam Ruby -- claims to fix a regression introduced by his recent sgmllib patches, and has not yet been applied) does NOT fix the problem. If nobody has time to fix this, perhaps rev 47154 should be reverted? commit message for -r47154: """ SF bug #1504333: sgmlib should allow angle brackets in quoted values (modified patch by Sam Ruby; changed to use separate REs for start and end tags to reduce matching cost for end tags; extended tests; updated to avoid breaking previous changes to support IPv6 addresses in unquoted attribute values) """ John _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com