Re: Problem with --reject option

2007-06-11 Thread Steven M. Schweda
From: Glenn Nieuwenhuyse

> wget -T 1 -t 1 -r --reject="robots.*" [...]
> 
> I would expect this not to download the robots.txt file, but still it
> does.

   Perhaps because "robots.txt" is a special case, and is not selected
by following links, and so is unaffected by the --reject option.

   A search for "robot" in the manual should reveal this:

  http://www.gnu.org/software/wget/manual/wget.html

robots = on/off
 Specify whether the norobots convention is respected by Wget,
 "on" by default. This switch controls both the /robots.txt and
 the nofollow aspect of the spec. See Robot Exclusion, for more
 details about this. Be sure you know what you are doing before
 turning this off.

So, adding "-e robots=off" to your command might help.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Problem with --reject option

2007-06-11 Thread Glenn Nieuwenhuyse

Hi all,

I'm using wget version 1.10.2 under windows and I want wget to avoid
downloading/saving all files called "robots.txt". I'm using the following
command line:

wget -T 1 -t 1 -r --reject="robots.*" http://150.158.230.231:1500

I would expect this not to download the robots.txt file, but still it does.
When I look at the directory where the files are stored once the command has
executed, the robots.txt file is still present.
Can anybody help me out with this one, because I'm probably missing
something trivial here.

Kind regards,

Glenn.