RE: huge shards (300GB each) and load balancing

Burton-West, Tom Wed, 08 Jun 2011 12:24:12 -0700

Hi Dmitry,

I am assuming you are splitting one very large index over multiple shards 
rather than replicating and index multiple times.


Just for a point of comparison, I thought I would describe our experience with 
large shards. At HathiTrust, we run a 6 terabyte index over 12 shards.  This is 
split over 4 machines with 3 shards per machine and our shards are about 
400-500GB.  We get average response times of around 200 ms with the 99th 
percentile queries up around 1-2 seconds. We have a very low qps rate, i.e. 
less than 1 qps.  We also index offline on a separate machine and update the 
indexes nightly.

Some of the issues we have found with very large shards are:
1) Becaue of the very large shard size, I/O tends to be the bottleneck, with 
phrase queries containing common words being the slowest.
2) Because of the I/O issues running cache-warming queries to get postings into 
the OS disk cache is important as is leaving significant free memory for the OS 
to use for disk caching
3) Because of the I/O issues using stop words or CommonGrams produces a 
significant performance increase.
2) We have a huge number of unique terms in our indexes.  In order to reduce 
the amount of memory needed by the in-memory terms index we set the 
termInfosIndexDivisor to 8, which causes Solr to only load every 8th term from 
the tii file into memory. This reduced memory use from over 18GB to below 3G 
and got rid of 30 second stop the world java Garbage Collections. (See 
http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again for 
details)  We later ran into memory problems when indexing so instead changed 
the index time parameter termIndexInterval from 128 to 1024.

(More details here: http://www.hathitrust.org/blogs/large-scale-search)

Tom Burton-West

RE: huge shards (300GB each) and load balancing

Reply via email to