On 6/21/05, Isaac Grover <[EMAIL PROTECTED]> wrote: > > > I wonder if someone on the list could come up with a sed one-liner? > > > Or a snippet of perl perhaps. It should be trivial to take a > > > directory of html files, extract html tags that bracket each URL that > > > mention a PDF file, and write a pseudo-HTML file that contains only > > > the PDF links for wget. > > I don't know sed, and it wouldn't be hard to do in perl I suppose, but this > is more or less what I use: > > #!/bin/sh > > wget http://www.example.com/links/ > grep "http://" index.html > index.txt > cat index.txt | awk 'BEGIN { FS="\"" } { print $2 }' > url_list.txt > > Then if you wanted to only grab the PDF files, do: > > grep "\.pdf" url_list.txt > new_url_list.txt > wget -i new_url_list.txt > > It is just after midnight here, so it may not work exactly as advertised, > but cut-n-paste usually doesn't lie, so it should work okay.
Thanks, Isaac, but as far as I understand your script, it does not apply with wget recursion. Paul
