Thanx for the explanation :)

-----Original Message-----
From: Paul Baclace [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 10, 2005 5:18 AM
To: nutch-dev@lucene.apache.org
Subject: Re: Distributed nutch

In addition to Stefan Groschupf's detailed references, here are some
short, high-level answers to your questions:

Rozina Sorathia wrote:
 >  1. What is Distributed nutch

  Nutch is a distributed Lucene with large scale web crawling.

 >2. How nutch distributed works?

  Modeled after Google's Map-Reduce and Google FS which is a single
master, multiple slave system tuned for 100-1000 nodes.

 >3. When we say distributed, what is distributed?

  The filesystem is distributed with multiple copies of files on
separate machines.  Crawling, parsing, sorting, and indexing are also
distributed.

 >4. When one server goes down, what happens?

  If the master goes down, it can be restarted from a checkpointed state
file.
  If a slave goes down, there is redundancy so that operations continue,
data is not lost, and work in progress dependent on the dead node is
automatically restarted.

Nutch version 0.8 is distributed (still under development in the
"mapred" branch) and earlier versions are not distributed.



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to