On Sunday, November 27, 2016 5:40:09 PM CET Sethi Badhan wrote:
> Hello
> 
> when i try to run simply wget in for loop it works fine but when i try to
> run using -e robots=off it not stopping and it downloading pages
> recursively even i have set the limit for 'for ' loop it is not stoping
> after that limit here is my code
> 
> #!/bin/bash
> 
> lynx --dump  https://en.wikipedia.org/wiki/Cloud_computing |awk
> '/http/{print $2}'| grep https://en. | grep -v
> '.svg\|.png\|.jpg\|.pdf\|.JPG\|.php' >Pages.txt
> grep -vwE "(
> http://www.enterprisecioforum.com/en/blogs/gabriellowy/value-data-platform-s
> ervice-dpaas)" Pages.txt > newpage.txt
> rm Pages.txt
> egrep -v "#|$^" newpage.txt>try.txt
> awk '!a[$0]++' try.txt>new.txt
> rm newpage.txt
> rm try.txt
> mkdir -p htmlpagesnew
> cd htmlpagesnew
> j=0
> for i in $( cat ../new.txt );
> do
> if [ $j -lt 10 ];
> then
>     let j=j+1;
>     echo $j
>     wget  -N -nd -r $i -e robots=off --wait=.25 ;
> fi
> done

Maybe you don't want '-r' ?

robots=off circumvents the robots.txt exclusion list... so it might download 
much more (and thus perhaps 'never' stops). 

Tim

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to