Re: wget --spider issue

Aaron S. Hawley Wed, 10 Sep 2003 09:06:14 -0700

On Wed, 10 Sep 2003, Andreas Belitz wrote:

> Hi,
>
>   i have found a problem regarding wget --spider.
>
>   It works great for any files over http or ftp, but as soon as one of
>   these two conditions occur, wget starts downloading the file:
>
>   1. linked files (i'm not 100% sure about this)
>   2. download scripts (i.e. http://www.nothing.com/download.php?file=12345&;)
>
>   i have included one link that starts downloading even if using the
>   --spider option:
>
>   
> http://club.aopen.com.tw/downloads/Download.asp?RecNo=3587&Section=5&Product=Motherboards&Model=AX59%20Pro&Type=Manual&DownSize=8388
>   (MoBo Bios file);
>
>   so this actually starts downloading:
>
>   $ wget --spider 
> 'http://club.aopen.com.tw/downloads/Download.asp?RecNo=3587&Section=5&Product=Motherboards&Model=AX59%20Pro&Type=Manual&DownSize=8388'


actually, what you call download scripts are actually HTTP redirects, and
in this case the redirect is to an FTP server and if you double-check i
think you'll find Wget does not know how to spider in ftp.  end
run-on-sentence.

>   If there is no conlclusion to this problem using wget can anyone
>   recommend another "Link-Verifier"? What i want to do is: check the
>   existence of som 200k links store in a database. So far i was trying
>   to use "/usr/bin/wget --spider \'" . $link . "\' 2>&1 | tail -2 | head
>   -1" in a simple php script.

I do something similar with Wget (using shell scripting instead), and I am
pleased with the outcome.  Since you are calling Wget for each link and if
you note that Wget does a good job of returning success or failure, you
can actually do this..

"wget --spider '$link' || echo '$link' >> badlinks.txt"

I can send you my shell scripts if you're interested.
/a

>
>   Thanks for any help!

-- 
"Our armies do not come into your cities and lands as conquerors or
enemies, but as liberators."
 - British Lt. Gen. Stanley Maude. "Proclamation to the People of the
   Wilayat of Baghdad". March 8, 1917.

Re: wget --spider issue

Reply via email to