On 2010-11-15 15:10, Dennis Kubes wrote: > Usually you wouldn't cut indexes. When doing distributed searching > usually you are crawling, processing, and indexing a batch of documents > (say 10 million) at a time and pushing them out to a distributed search > server on a local file system along with their segments. Then you would > move on to the next batch and the next until you run out of available > hardware resources. Then you reset the crawldb so every document is > crawlable again and you start the process all over. > > There isn't an index cutter per se. You can use the segment merger to > put multiple segments together and then index that segment. I have found > that the shard approach above is a better option in most cases.
I fully agree with what Dennis said above, just wanted to point out that there is an IndexSplitter tool (actually two similar tools - a limited IndexSplitter and flexible MultiPassIndexSplitter) in Lucene that you can use to split indexes. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

