Hi > My question; have you build a general site to crawl the internet and > how did you find links that people would be interested in as opposed > to capturing a lot of the junk out there.
interesting question. are you planning to build a new google ? if you are planning to crawl without any limit on f.e. a few domains, your indexes will go wild very quickly :-) we are using nutch now with an extensive list of 'interesting domains' - this list is an editorial effort. search results are limited to those domains. http://www.labforculture.org/opensearch/custom another application would be to use nutch to crawl certain pages, like 'interesting' search results from other sites, with a limited depth. this would yield 'interesting' indexes. yet another application would be to crawl 'interesting' rss feeds with a depth of 1. I haven't got that working yet (see the parse-rss discussion these days). nevertheless, I am interested in the question: anyone else having examples of "possible public applications with nutch" ? $2c, *pike
