Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The following page has been changed by JamesBrady:
http://wiki.apache.org/solr/CollectionDistribution

The comment on the change is:
Change "ten minute" claim about optimize times

------------------------------------------------------------------------------
  
  On a very large index, adding even a few documents then running an optimize 
means rewriting the complete index.  This consumes a lot of disk I/O and 
impacts query performace. Optimizing a very large index may even involve 
copying the index twice  — the current code for merging one index into 
another calls optimize at the beginning ''and'' the end.  If some docs have 
been deleted, the first optimize call will rewrite the index even before the 
second index is merged.
  
- Optimizations can take nearly ten minutes to run.  We do not know what 
happens to query performance on a collection that has not been optimized for a 
long time. We ''do'' know that it will get worse as the collection becomes more 
fragmented, but   how much worse is very dependent on the manner of updates and 
commits to the collection.
+ Optimization is an I/O intensive process, as the entire index is read and 
re-written in optimized form. Anecdotal data shows that optimizations on modest 
server hardware can take around 5 minutes per GB, although this obviously 
varies considerably with index fragmentation and hardware bottlenecks. We do 
not know what happens to query performance on a collection that has not been 
optimized for a long time. We ''do'' know that it will get worse as the 
collection becomes more fragmented, but how much worse is very dependent on the 
manner of updates and commits to the collection.
  
  We are presuming optimizations should be run once following large 
''batch-like'' updates to the collection and/or once a day.
  

Reply via email to