I have provided a set of URLs to crawl, about 40k so far that have mostly been updated in the last couple of months. May make for interesting crawls. I kind of got tired of using the "dmoz" dumps because the links were so old. They are a lot of them of course. Anyway, if you are interested in internet crawls, you may try it out.
http://botspiritcompany.com/botlist/spring/pipes/rdf_nutch.html -- Berlin Brown http://www.newspiritcompany.com - newspirit technologies
