Re: how to cut indexes for distributed searching?

Andrzej Bialecki Mon, 15 Nov 2010 07:33:41 -0800

On 2010-11-15 15:10, Dennis Kubes wrote:
> Usually you wouldn't cut indexes. When doing distributed searching
> usually you are crawling, processing, and indexing a batch of documents
> (say 10 million) at a time and pushing them out to a distributed search
> server on a local file system along with their segments. Then you would
> move on to the next batch and the next until you run out of available
> hardware resources. Then you reset the crawldb so every document is
> crawlable again and you start the process all over.
> 
> There isn't an index cutter per se. You can use the segment merger to
> put multiple segments together and then index that segment. I have found
> that the shard approach above is a better option in most cases.


I fully agree with what Dennis said above, just wanted to point out that
there is an IndexSplitter tool (actually two similar tools - a limited
IndexSplitter and flexible MultiPassIndexSplitter) in Lucene that you
can use to split indexes.



-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: how to cut indexes for distributed searching?

Reply via email to