Re: how to cut indexes for distributed searching?

Dennis Kubes Mon, 15 Nov 2010 06:10:51 -0800

Usually you wouldn't cut indexes. When doing distributed searchingusually you are crawling, processing, and indexing a batch of documents(say 10 million) at a time and pushing them out to a distributed searchserver on a local file system along with their segments. Then you wouldmove on to the next batch and the next until you run out of availablehardware resources. Then you reset the crawldb so every document iscrawlable again and you start the process all over.

There isn't an index cutter per se. You can use the segment merger toput multiple segments together and then index that segment. I have foundthat the shard approach above is a better option in most cases.


Dennis

On 11/14/2010 11:07 PM, 朱诗雄 wrote:

hi,all

I want to use nutch for distributed searching. But I don't know how to
cut indexes for distributed searching?
Is there a guide for that?

Re: how to cut indexes for distributed searching?

Reply via email to