Re: spanning hosts: 2 Problems

Ian Abbott Tue, 26 Mar 2002 10:11:08 -0800

On 26 Mar 2002 at 19:01, Jens Rösner wrote:

> I am using wget to parse a local html file which has numerous links into
> the www.
> Now, I only want hosts that include certain strings like 
> -H -Daudi,vw,online.de


It's probably worth noting that the comparisons between the -D
strings and the domains being followed (or not) is anchored at
the ends of the strings, i.e. "-Dfoo" matches "bar.foo" but not
"foo.bar".

> Two things I don't like in the way wget 1.8.1 works on windows:
> 
> The first page of even the rejected hosts gets saved.

That sounds like a bug.

> This messes up my directory structure as I force directories 
> (which is my default and normally useful)
> 
> I am aware that wget has switched to breadth first (as opposed to
> depth-first) 
> retrieval.
> Now, with downloading from many (20+) different servers, this is a bit
> frustrating, 
> as I will probably have the first completely downloaded site in a few
> days...

Would that be less of a problem if the first problem (first page
from rejected domains) was fixed?

> Is there any other way to work around this besides installing wget 1.6
> (or even 1.5?)

No, but note that if you pass several starting URLs to Wget, it
will complete the first before moving on to the second. That also
works for the URLs in the file specified by the --input-file
parameter. However, if all the sites are interlinked, you would be
no better off with this. The other alternative is to run wget
several times in sequence with different starting URLs and restrictions, perhaps using 
the --timestamping or --no-clobber
options to avoid downloading things more than once.

Re: spanning hosts: 2 Problems

Reply via email to