On Sunday, November 27, 2016 5:40:09 PM CET Sethi Badhan wrote: > Hello > > when i try to run simply wget in for loop it works fine but when i try to > run using -e robots=off it not stopping and it downloading pages > recursively even i have set the limit for 'for ' loop it is not stoping > after that limit here is my code > > #!/bin/bash > > lynx --dump https://en.wikipedia.org/wiki/Cloud_computing |awk > '/http/{print $2}'| grep https://en. | grep -v > '.svg\|.png\|.jpg\|.pdf\|.JPG\|.php' >Pages.txt > grep -vwE "( > http://www.enterprisecioforum.com/en/blogs/gabriellowy/value-data-platform-s > ervice-dpaas)" Pages.txt > newpage.txt > rm Pages.txt > egrep -v "#|$^" newpage.txt>try.txt > awk '!a[$0]++' try.txt>new.txt > rm newpage.txt > rm try.txt > mkdir -p htmlpagesnew > cd htmlpagesnew > j=0 > for i in $( cat ../new.txt ); > do > if [ $j -lt 10 ]; > then > let j=j+1; > echo $j > wget -N -nd -r $i -e robots=off --wait=.25 ; > fi > done
Maybe you don't want '-r' ? robots=off circumvents the robots.txt exclusion list... so it might download much more (and thus perhaps 'never' stops). Tim
signature.asc
Description: This is a digitally signed message part.