-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Paolo Lunazzi wrote:
> Hi everybody!
> I have a particular problem with wget because I've to
> solve a strange requirement.
> I would like to download, starting from the homepage
> of a website, the first 100 pages found with a visit
> of the link in amplitude. Is it possible to make this
> thing with wget? Reading the man I have not found
> nothing about this :(

Hi!

You were on the IRC channel, right?

IIRC, you needed this for evaluating the accessibility of a website, and
100 pages was to get a statistically viable sample.

Wget doesn't let you download up to a limit of a specific number of
pages, and I'd probably need to see more use cases for such a thing
before I'd be willing to write that into the main code. It'd be a pretty
easy feature to patch in, though.

However, as I pointed out on IRC, getting the first 100 pages is _not_
going to get you a statistically viable sample, because you're not
choosing them at random. It's entirely plausible for the worse
accessibility offenders to be the deepest links, whereas I believe Wget
takes a breadth-first approach to recursive downloading. You'd be doing
much better to map or fetch the entire web tree, and then select from
_those_, 100 pages, completely at random.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHq1D37M8hyUobTrERAnj1AJ4g99xM4DDwr7cVZ3N5MrV591ip8QCfYobj
BVlTwQkZnTuGfwmxRVjnz5w=
=/d+X
-----END PGP SIGNATURE-----

Reply via email to