wget --spider issue

2003-09-10 Thread Andreas Belitz
Hi,

  i have found a problem regarding wget --spider.

  It works great for any files over http or ftp, but as soon as one of
  these two conditions occur, wget starts downloading the file:

  1. linked files (i'm not 100% sure about this)
  2. download scripts (i.e. http://www.nothing.com/download.php?file=12345;)

  i have included one link that starts downloading even if using the
  --spider option:

  
http://club.aopen.com.tw/downloads/Download.asp?RecNo=3587Section=5Product=MotherboardsModel=AX59%20ProType=ManualDownSize=8388
  (MoBo Bios file);

  so this actually starts downloading:

  $ wget --spider 
'http://club.aopen.com.tw/downloads/Download.asp?RecNo=3587Section=5Product=MotherboardsModel=AX59%20ProType=ManualDownSize=8388'

  If there is no conlclusion to this problem using wget can anyone
  recommend another Link-Verifier? What i want to do is: check the
  existence of som 200k links store in a database. So far i was trying
  to use /usr/bin/wget --spider \' . $link . \' 21 | tail -2 | head -1
  in a simple php script.

  Thanks for any help!


- 
Best Regards,

Andreas Belitz
CIO

TCTK - Database Solutions
Nordanlage 3
35390 Giessen
Germany

Phone: +49 (0) 641 3019 446
Fax  : +49 (0) 641 3019 535
Mobile   : +49 (0) 176 700 16161

E-mail   : mailto:[EMAIL PROTECTED]
Internet : http://www.tctk.de



Re: wget --spider issue

2003-09-10 Thread Aaron S. Hawley
On Wed, 10 Sep 2003, Andreas Belitz wrote:

 Hi,

   i have found a problem regarding wget --spider.

   It works great for any files over http or ftp, but as soon as one of
   these two conditions occur, wget starts downloading the file:

   1. linked files (i'm not 100% sure about this)
   2. download scripts (i.e. http://www.nothing.com/download.php?file=12345;)

   i have included one link that starts downloading even if using the
   --spider option:

   
 http://club.aopen.com.tw/downloads/Download.asp?RecNo=3587Section=5Product=MotherboardsModel=AX59%20ProType=ManualDownSize=8388
   (MoBo Bios file);

   so this actually starts downloading:

   $ wget --spider 
 'http://club.aopen.com.tw/downloads/Download.asp?RecNo=3587Section=5Product=MotherboardsModel=AX59%20ProType=ManualDownSize=8388'

actually, what you call download scripts are actually HTTP redirects, and
in this case the redirect is to an FTP server and if you double-check i
think you'll find Wget does not know how to spider in ftp.  end
run-on-sentence.

   If there is no conlclusion to this problem using wget can anyone
   recommend another Link-Verifier? What i want to do is: check the
   existence of som 200k links store in a database. So far i was trying
   to use /usr/bin/wget --spider \' . $link . \' 21 | tail -2 | head
   -1 in a simple php script.

I do something similar with Wget (using shell scripting instead), and I am
pleased with the outcome.  Since you are calling Wget for each link and if
you note that Wget does a good job of returning success or failure, you
can actually do this..

wget --spider '$link' || echo '$link'  badlinks.txt

I can send you my shell scripts if you're interested.
/a


   Thanks for any help!

-- 
Our armies do not come into your cities and lands as conquerors or
enemies, but as liberators.
 - British Lt. Gen. Stanley Maude. Proclamation to the People of the
   Wilayat of Baghdad. March 8, 1917.


Re: wget --spider issue

2003-09-10 Thread Andreas Belitz
Hi Aaron S. Hawley,

On Wed, 10. September 2003 you wrote:

ASH actually, what you call download scripts are actually HTTP redirects, and
ASH in this case the redirect is to an FTP server and if you double-check i
ASH think you'll find Wget does not know how to spider in ftp.  end
ASH run-on-sentence.

Ok. This seems to be the reason. Thanks. Is there any way to make wget
spider ftp adresses?

ASH I can send you my shell scripts if you're interested.
ASH /a

That would be great!


- 
Mit freundlichen GrĂ¼ssen

Andreas Belitz
CIO

TCTK - Database Solutions
Nordanlage 3
35390 Giessen
Germany

Phone: +49 (0) 641 3019 446
Fax  : +49 (0) 641 3019 535
Mobile   : +49 (0) 176 700 16161

E-mail   : mailto:[EMAIL PROTECTED]
Internet : http://www.tctk.de



Re: wget --spider issue

2003-09-10 Thread Aaron S. Hawley

On Wed, 10 Sep 2003, Andreas Belitz wrote:

 Hi Aaron S. Hawley,

 On Wed, 10. September 2003 you wrote:

 ASH actually, what you call download scripts are actually HTTP redirects, and
 ASH in this case the redirect is to an FTP server and if you double-check i
 ASH think you'll find Wget does not know how to spider in ftp.  end
 ASH run-on-sentence.

 Ok. This seems to be the reason. Thanks. Is there any way to make wget
 spider ftp adresses?

I sent a patch to this list over the winter.  it's included with the shell
scripts i spoke of and have attached to this message.

 ASH I can send you my shell scripts if you're interested.
 ASH /a

 That would be great!

gnurls-0.1.tar.gz
Description: Binary data