On 26 Mar 2002 at 19:01, Jens Rösner wrote: > I am using wget to parse a local html file which has numerous links into > the www. > Now, I only want hosts that include certain strings like > -H -Daudi,vw,online.de
It's probably worth noting that the comparisons between the -D strings and the domains being followed (or not) is anchored at the ends of the strings, i.e. "-Dfoo" matches "bar.foo" but not "foo.bar". > Two things I don't like in the way wget 1.8.1 works on windows: > > The first page of even the rejected hosts gets saved. That sounds like a bug. > This messes up my directory structure as I force directories > (which is my default and normally useful) > > I am aware that wget has switched to breadth first (as opposed to > depth-first) > retrieval. > Now, with downloading from many (20+) different servers, this is a bit > frustrating, > as I will probably have the first completely downloaded site in a few > days... Would that be less of a problem if the first problem (first page from rejected domains) was fixed? > Is there any other way to work around this besides installing wget 1.6 > (or even 1.5?) No, but note that if you pass several starting URLs to Wget, it will complete the first before moving on to the second. That also works for the URLs in the file specified by the --input-file parameter. However, if all the sites are interlinked, you would be no better off with this. The other alternative is to run wget several times in sequence with different starting URLs and restrictions, perhaps using the --timestamping or --no-clobber options to avoid downloading things more than once.