Usually you wouldn't cut indexes. When doing distributed searching
usually you are crawling, processing, and indexing a batch of documents
(say 10 million) at a time and pushing them out to a distributed search
server on a local file system along with their segments. Then you would
move on to the next batch and the next until you run out of available
hardware resources. Then you reset the crawldb so every document is
crawlable again and you start the process all over.
There isn't an index cutter per se. You can use the segment merger to
put multiple segments together and then index that segment. I have found
that the shard approach above is a better option in most cases.
Dennis
On 11/14/2010 11:07 PM, 朱诗雄 wrote:
hi,all
I want to use nutch for distributed searching. But I don't know how to
cut indexes for distributed searching?
Is there a guide for that?