On 6/21/05, Isaac Grover <[EMAIL PROTECTED]> wrote:
> > > I wonder if someone on the list could come up with a sed one-liner?
> > > Or a snippet of perl perhaps.  It should be trivial to take a
> > > directory of html files, extract html tags that bracket each URL that
> > > mention a PDF file, and write a pseudo-HTML file that contains only
> > > the PDF links for wget.
> 
> I don't know sed, and it wouldn't be hard to do in perl I suppose, but this
> is more or less what I use:
> 
> #!/bin/sh
> 
> wget http://www.example.com/links/
> grep "http://"; index.html > index.txt
> cat index.txt | awk 'BEGIN { FS="\"" } { print $2 }' > url_list.txt
> 
> Then if you wanted to only grab the PDF files, do:
> 
> grep "\.pdf" url_list.txt > new_url_list.txt
> wget -i new_url_list.txt
> 
> It is just after midnight here, so it may not work exactly as advertised,
> but cut-n-paste usually doesn't lie, so it should work okay.

Thanks, Isaac, but as far as I understand your script, it does not
apply with wget recursion.

Paul

Reply via email to