On 26 Mar 2002 at 20:24, Jens Rösner wrote: > Hi Ian! Hi Jens! > > > The first page of even the rejected hosts gets saved. > > That sounds like a bug. > > Should I try to get a useful debug log? > (It is Windows, so I do not know if it is helpful.)
A debug log will be useful if you can produce one. However, see the discussion below about the --input-file option. What I initially assumed to be a bug may be a feauture (or at least a misfeature). Also note that if receive cookies that expire around 2038 with debugging on, the Windows version of Wget will crash! (This is a known bug with a known fix, but not yet finalised in CVS.) > [depth first] > > > Now, with downloading from many (20+) different servers, this is a bit > > > frustrating, > > > as I will probably have the first completely downloaded site in a few > > > days... > > > > Would that be less of a problem if the first problem (first page > > >from rejected domains) was fixed? > > Not really, the problems are quite different for me. Oh well, it was just a thought! > > > Is there any other way to work around this besides installing wget 1.6 > > > (or even 1.5?) > > No, > > I just installed 1.7.1, which also works breadth-first. (I think you mean depth-first.) Yes, that was the last version that used depth-first retrieval. There are advantages and disadvantages with both types of retrieval. One of the reasons for the switch was that there were some problems with Wget's measurement of 'depth' of links for highly interconnected web-sites (for example on-line manuals with separate pages for each section, 'next' and 'previous' links and a 'contents' page) that made limited-depth retrievals not work very well due to pages being both deep-nested and shallow-nested at the same time! The true measure of depth is the minimum length of a path to a page, not the length of the first encountered path to a page (which is what Wget's depth-first algorithm was using). The breadth-first approach, by its very nature, sees the shortest paths to a particular page before any longer paths and neatly avoided the problem. > > The other alternative is to run wget > > several times in sequence with different starting URLs and restrictions, perhaps >using > > the --timestamping or --no-clobber > > options to avoid downloading things more than once. > > Of course, this is possible. > I just had hoped that by combining > -F -i url.html > with domain acceptance would save me a lot of time. Oh, I think I see what your first complaint is now. I initially assumed that your local html file was being served by a local HTTP server rather than being fed to the -F -i options. Is your complaint really that URLs supplied on the command line or via the -i option are not subjected to the acceptance/rejection rules? That does indeed seem to be the current behavior, but there is not particular reason why we couldn't apply the tests to these URLs as well as the URLs obtained through recursion.