Re: Possible public applications with nutch and hadoop

Pike Sat, 13 Oct 2007 18:25:57 -0700

Hi

> My question; have you build a general site to crawl the internet and
> how did you find links that people would be interested in as opposed
> to capturing a lot of the junk out there.


interesting question. are you planning to build a new google ?
if you are planning to crawl without any limit on f.e. a few
domains, your indexes will go wild very quickly :-)

we are using nutch now with an extensive list of
'interesting domains' - this list is an editorial effort.
search results are limited to those domains.
http://www.labforculture.org/opensearch/custom

another application would be to use nutch to crawl
certain pages, like 'interesting' search results from
other sites, with a limited depth. this would yield
'interesting' indexes.

yet another application would be to crawl 'interesting'
rss feeds with a depth of 1. I haven't got that working
yet (see the parse-rss discussion these days).

nevertheless, I am interested in the question:
anyone else having examples of "possible public
applications with nutch" ?

$2c,
*pike

Re: Possible public applications with nutch and hadoop

Reply via email to