Thanx for the explanation :) -----Original Message----- From: Paul Baclace [mailto:[EMAIL PROTECTED] Sent: Thursday, November 10, 2005 5:18 AM To: nutch-dev@lucene.apache.org Subject: Re: Distributed nutch
In addition to Stefan Groschupf's detailed references, here are some short, high-level answers to your questions: Rozina Sorathia wrote: > 1. What is Distributed nutch Nutch is a distributed Lucene with large scale web crawling. >2. How nutch distributed works? Modeled after Google's Map-Reduce and Google FS which is a single master, multiple slave system tuned for 100-1000 nodes. >3. When we say distributed, what is distributed? The filesystem is distributed with multiple copies of files on separate machines. Crawling, parsing, sorting, and indexing are also distributed. >4. When one server goes down, what happens? If the master goes down, it can be restarted from a checkpointed state file. If a slave goes down, there is redundancy so that operations continue, data is not lost, and work in progress dependent on the dead node is automatically restarted. Nutch version 0.8 is distributed (still under development in the "mapred" branch) and earlier versions are not distributed. ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers