> for instance, the syntax for --filter presented above is basically the 
> following:
> --filter=[+|-][file|path|domain]:REGEXP

I think a file 'contents' regexp search facility would be a useful 
addition here.  eg.


The idea is that if the file just downloaded has a regexp match for 
expression REGEXP (ie. as in 'egrep REGEXP file.html') then that file is 
kept and its links processed as normal.  If no match is found the file is 
just deleted.  Such a facility could be used to prevent recursive 
downloads wandering way off topic.


wget -e robots=off -r -N -k -E -p -H http://www.gnu.org/software/wget/

soon leads to non wget related links being downloaded, eg. 

My suggestion is that with;

wget -e robots=off -r -N -k -E -p -H --filter=+contents:wget 

any page not containing  the string 'wget' is deleted and its links not 

Tom Crane
Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill,
Egham, Surrey, TW20 0EX, England. 
Fax:    +44 (0) 1784 472794

Reply via email to