> Tobias Tiederle's regex patch: > > regex is a very nice feature which i am planning to include in a future > release of wget (hopefully 1.11). but, since wget supports so many platforms, > we have to come up with a portable implementation of regex and i don't know > yet if from this point of view using the PCRE library as Tobias does in its > patch is the best thing to do. i would like to hear other opinions on this > point.
If possible, it seems preferable to me to use the platform's C library regex support rather than make wget dependent on another library... A useful enhancement to Tobias's scheme, if I've understood it correctly, would be to have two new regex options (ideally with POSIX basic & extended options -- user selectable), eg. --exclude-regex-contents and --include-regex-contents which effectively grep (or grep -E) each d/l file of html and keep or delete it based on its contents. By matching on data in meta-tags or relevant text strings in the body of the html, should make a powerful way to prevent wget -H, in particular, from wandering off topic when the target pages have many links to adsites and other unwanted stuff. Regards Tom Crane -- Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 0EX, England. Email: [EMAIL PROTECTED] Fax: +44 (0) 1784 472794
