Merging does not happen often enough to keep deleted documents to a low enough count ?
Maybe there's a need to have "partial" optimization available in solr, meaning that segment with too much deleted document could be copied to a new file without unnecessary datas. That way cleaning deleted datas could be compatible with having light replications. I'm worried by this idea of deleted documents influencing relevance scores, any pointer to how important this influence may be ? Pierre -----Message d'origine----- De : Shawn Heisey [mailto:s...@elyograg.org] Envoyé : vendredi 22 juillet 2011 16:42 À : solr-user@lucene.apache.org Objet : Re: commit time and lock On 7/22/2011 8:23 AM, Pierre GOSSE wrote: > I've read that in a thread title " Weird optimize performance degradation", > where Erick Erickson states that "Older versions of Lucene would search > faster on an optimized index, but this is no longer necessary.", and more > recently in a thread you initiated a month ago "Question about optimization". > > I'll also be very interested if anyone had a more precise idea/datas of > benefits and tradeoff of optimize vs merge ... My most recent testing has been with Solr 3.2.0. I have noticed some speedup after optimizing an index, but the gain is not earth-shattering. My index consists of 7 shards. One of them is small, and receives all new documents every two minutes. The others are large, and aside from deletes, are mostly static. Once a day, the oldest data is distributed from the small shard to its proper place in the other six shards. The small shard is optimized once an hour, and usually takes less than a minute. I optimize one large shard every day, so each one gets optimized once every six days. That optimize takes 10-15 minutes. The only reason that I optimize is to remove deleted documents, whatever speedup I get is just icing on the cake. Deleted documents take up space and continue to influence the relevance scoring of queries, so I want to remove them. Thanks, Shawn