> Tobias Tiederle's regex patch:
> 
> regex is a very nice feature which i am planning to include in a future 
> release of wget (hopefully 1.11). but, since wget supports so many platforms, 
> we have to come up with a portable implementation of regex and i don't know 
> yet if from this point of view using the PCRE library as Tobias does in its 
> patch is the best thing to do. i would like to hear other opinions on this 
> point.

If possible, it seems preferable to me to use the platform's C library
regex support rather than make wget dependent on another library...  

A useful enhancement to Tobias's scheme, if I've understood it correctly,
would be to have two new regex options (ideally with POSIX basic &
extended options -- user selectable), eg. --exclude-regex-contents and
--include-regex-contents which effectively grep (or grep -E) each d/l file
of html and keep or delete it based on its contents.  By matching on data
in meta-tags or relevant text strings in the body of the html, should make
a powerful way to prevent wget -H, in particular, from wandering off topic
when the target pages have many links to adsites and other unwanted stuff.

Regards
Tom Crane
-- 
Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill,
Egham, Surrey, TW20 0EX, England. 
Email:  [EMAIL PROTECTED]
Fax:    +44 (0) 1784 472794

Reply via email to