Path-ascending crawler

Frank McCown Wed, 21 Dec 2005 08:33:51 -0800

A path-ascending crawler is one that, when given the URLhttp://foo.org/a/b/page.html, will attempt to crawl


http://foo.org/a/b/page.html
http://foo.org/a/b/
http://foo.org/a/
http://foo.org/

This will increase the ability of the crawler to find resources that arenot linked to by other resources, giving a more complete picture of theactual contents of a web server. See "Web-Crawling Reliability" by VivCothey (2004) for more info.


It would be nice to have this functionality in wget.  Something like:

wget -r -path-ascend http://foo.org/

What do you guys think?

Frank

Path-ascending crawler

Reply via email to