-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Paolo Lunazzi wrote: > Hi everybody! > I have a particular problem with wget because I've to > solve a strange requirement. > I would like to download, starting from the homepage > of a website, the first 100 pages found with a visit > of the link in amplitude. Is it possible to make this > thing with wget? Reading the man I have not found > nothing about this :(
Hi! You were on the IRC channel, right? IIRC, you needed this for evaluating the accessibility of a website, and 100 pages was to get a statistically viable sample. Wget doesn't let you download up to a limit of a specific number of pages, and I'd probably need to see more use cases for such a thing before I'd be willing to write that into the main code. It'd be a pretty easy feature to patch in, though. However, as I pointed out on IRC, getting the first 100 pages is _not_ going to get you a statistically viable sample, because you're not choosing them at random. It's entirely plausible for the worse accessibility offenders to be the deepest links, whereas I believe Wget takes a breadth-first approach to recursive downloading. You'd be doing much better to map or fetch the entire web tree, and then select from _those_, 100 pages, completely at random. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHq1D37M8hyUobTrERAnj1AJ4g99xM4DDwr7cVZ3N5MrV591ip8QCfYobj BVlTwQkZnTuGfwmxRVjnz5w= =/d+X -----END PGP SIGNATURE-----
