Hi all,
I have been trying to run a crawl on a couple of different domains using
nutch:
bin/nutch crawl urls -dir crawled -depth 3
Everytime I get the response:
Stopping at depth=x - no more URLs to fetch. Sometimes a page or two at the
first level get crawled and in most other cases, nothing
Hi all...
I was finally able to set up a multinode nutch cluster that seemed to work
fine. When I set it up to do the example crawl of http://lucene.apache.org
then the crawl seemed to finish successfully as indicated by the output on
the console. When I copied the index files on to the local fil
Hi All...
I have been trying to set up nutch on a cluster of 3 machines. I could get
the crawling and searching process to run independently on all 3 machines
but when I try to integrate them as a single cluster, then none of the
slaves are shown in the listing of nodes on the Hadoop Machine List