On 12/14/11 3:15 AM, Boris Zbarsky wrote:
Yeah, understood. Working on getting that description.
Ok. It's just a simple spider that starts with the list at http://code.google.com/p/httparchive/source/browse/trunk/lists/All.txt and for each of those urls loads the url itself and then follows all same-host links from that page. So loads the front page of the site and all the same-host one-level-deep pages.
-Boris