Re: [Nutch-general] Question: Nutch distributed WebDB configuration and possibility o f Nutch distributed Web crawling

Stefan Groschupf Thu, 07 Oct 2004 11:07:43 -0700

Hi,

<x-tad-bigger>It is mentioned on the Nutch website that to span WebDB database across multiple machines distributed WebDB is implemented. How do you configure distributed WebDB? I looked into several email threads and could not find a satisfactory answer. Only I know is how you query from distributed WebDB: you create a search-servers.txt at Nutch client i.e. Tomcat server and in that file you give host info with port number and run bin\nutch server portnumber at other servers where distributed WebDB is present. Is this understanding correct? However when you run the fetch and it is updating WebDB which is going to expand beyond one machine, do you need to configure something specific so that WebDB knows that it has to expand to another machine?</x-tad-bigger>

As far I know the distributed webdb update is broken (cvs head) since some changes related to the distributed file system was done.
In general I think the distributed webdb is not necessary as soon we have the distributed file system up and running.
Anyway in the developer mainling list archive should be some mails that describe what you need to do.

<x-tad-bigger>Also does Nutch support distributed Web crawling?
</x-tad-bigger>

No, you will find a set of mail discussions in the nutch developer mailing list.
The statement is that distributed crawling brings no performance improvements to the crawling.

<x-tad-bigger>Thanks in advance for your quick pointers to my above mentioned queries.</x-tad-bigger>

HTH
Stefan

Re: [Nutch-general] Question: Nutch distributed WebDB configuration and possibility o f Nutch distributed Web crawling

Reply via email to