So the indexes are combined from some batches of documents in different servers. Is there any mechanism to make the df accurate? Or the df is just relative to indexes from one batch?
2010/11/15 Dennis Kubes <[email protected]>: > Usually you wouldn't cut indexes. When doing distributed searching usually > you are crawling, processing, and indexing a batch of documents (say 10 > million) at a time and pushing them out to a distributed search server on a > local file system along with their segments. Then you would move on to the > next batch and the next until you run out of available hardware resources. > Then you reset the crawldb so every document is crawlable again and you > start the process all over. > > There isn't an index cutter per se. You can use the segment merger to put > multiple segments together and then index that segment. I have found that > the shard approach above is a better option in most cases. > > Dennis > > On 11/14/2010 11:07 PM, 朱诗雄 wrote: >> >> hi,all >> >> I want to use nutch for distributed searching. But I don't know how to >> cut indexes for distributed searching? >> Is there a guide for that? >> > -- Thanks and best regards. zsx

