Hi there,

I have a rather puzzling problem with only some sites.

Short version: For some sites, I can not retrieve html files if these
files already exist on my harddisk.

Long version:

I try go mirror www.grc.com with the command
  wget http://www.grc.com --mirror
and get
  ERROR 403: Forbidden.
for the file www.grc.com/index.html

Well, this has worked before, I have the file www.grc.com\index.html
on my disk. If I delete this file, things work alright and processing
continues up to the next html-file. This again fails if I already
have it on my disk.

Browsing this site with Netscape (same proxy) is not a problem.
First I thought the site was blocking wget, but I tried changing the
user agent in wget -> no improvement.

I get the Error 403 in wget only for html files, never for other files.

So far I have found two sites with this problem, www.grc.com
and www.sysinternals.com.
Other sites -> no problem.

Can someone tell me please what is going on? Can I solve this
problem here locally, or do I need to contact the webmasters of these
sites, asking for a change? If so, which change?

Thanks for all advice

Ulrich

PS: In case it matters, I am using 'GNU Wget 1.7.1-pre1',
    running on WinNT 4.0, SP6.
-- 
Unix gives you just enough rope to hang yourself -- and then a 
couple of more feet, just to be sure." (Eric Allman)

Reply via email to