-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
The regular expression remove script out of an HTML/XHTML file is simple enough but raises a major performance issue.... The following regular expression : r'(<script(\s*\S+\s*)+</script>)' takes ages to complete in python on simple HTML file more than 3 minutes of CPU time on a 150 lines HTML file. In jython it just never completes but returns a painfull RunTimeException : maximum number of ??? reached. Is the only way out dealing with strings and "match" instead of regular expression ? More over Jython is not yet 2.3 compliant, hence advanced features of 2.3 regular expression are not yet available ! \T, Thomas SMETS wrote: | | Dear, | | I need to parse XHTML/HTML files in all ways : | ~ _ Removing comments and javascripts is a first issue | ~ _ Retrieving the list of fields to submit is my following item (todo) | | Any idea where I could find this already made ... ? | | \T, | | - -- Thomas SMETS Bruxelles @ : [EMAIL PROTECTED] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD4DBQFC7OkTqN0SJr+xLBURAuTYAKDLxLv+hpnSrZ6uowOmUczVxgxLqwCYhfJ3 fwjPZzg88gh3lNY8jkG3SA== =urIC -----END PGP SIGNATURE----- -- http://mail.python.org/mailman/listinfo/python-list