Jamie Zawinski <[EMAIL PROTECTED]> writes:
[...]
> It's downloading about every 4th subdirectory under gallery/2001/;
> if you look at the index.html file there, you'll see that all links
> are in identical syntax, so I don't see why it's downloading 07-13/
> but skipping 07-14/.
> 
> And then, strangely, if I leave off the flyers/ URL on the command
> line, it downloads more of the gallery/ directories -- but not all
> of them.

I'm not sure what exactly is going on here, but I would guess that
Wget 1.7 tried to be overzealous about remembering which URLs are
"undesirable" to load, and thus had one recursive download hose the
other.  Or something like that.

I've rewritten the recursive download code for 1.8 to traverse the
links breadth-first, and to be much less zealous about blacklisting
the URLs chosen not to be downloaded at some random point.  I believe
the bug you saw has been fixed.  I've now tried your test case and I
got this:

$ find flyers gallery -type d | sort
flyers
flyers/2001
flyers/2001/07
flyers/2001/08
flyers/2001/09
flyers/2001/10
flyers/2001/11
flyers/2001/12
gallery
gallery/2001
gallery/2001/07-13
gallery/2001/07-14
gallery/2001/07-28
gallery/2001/08-01
gallery/2001/08-04
gallery/2001/08-10
gallery/2001/08-17
gallery/2001/08-31
gallery/2001/09-01
gallery/2001/09-16
gallery/2001/09-20
gallery/2001/09-23
gallery/2001/10-05
gallery/2001/10-14
gallery/2001/10-31
gallery/2001/11-15

It downloads 978 JPG and 9 GIF files in total.

Wget 1.8 is officially in beta, but it's quite stable.  If you want to
try if it works for you, grab it from here:

    ftp://gnjilux.srk.fer.hr/pub/unix/util/wget/.betas/wget-1.8-beta1.tar.gz

Thanks for the report.

Reply via email to