A path-ascending crawler is one that, when given the URL
http://foo.org/a/b/page.html, will attempt to crawl
http://foo.org/a/b/page.html
http://foo.org/a/b/
http://foo.org/a/
http://foo.org/
This will increase the ability of the crawler to find resources that are
not linked to by other resources, giving a more complete picture of the
actual contents of a web server. See "Web-Crawling Reliability" by Viv
Cothey (2004) for more info.
It would be nice to have this functionality in wget. Something like:
wget -r -path-ascend http://foo.org/
What do you guys think?
Frank