On 12/12/2016 1:14 PM, Piyush Kunal wrote:
> We did the following change:
>
> 1. Previously we had 1 shard and 32 replicas for 1.2million documents of
> size 5 GB.
> 2. We changed it to 4 shards and 8 replicas for 1.2 million documents of
> size 5GB

How many machines and shards per machine were you running in both
situations?  For either setup, I would recommend at least 32 machines,
where each one handles exactly one shard replica.  For the latter setup,
you may need even more machines, so there are more replicas.

> We have a combined RPM of around 20k rpm for solr.

Twenty thousand queries per minute is over 300 per second.  This is a
very high query rate, which is going to require many replicas.  Your
replica count has gone down significantly with the change you made.

> But unfortunately we saw a degrade in performance with RTs going insanely
> high when we moved to setup 2.

With such a high query rate, I'm not really surprised that this caused
the performance to go down, even if you actually do have 32 machines. 
Distributed queries are a two-phase process where the coordinating node
sends individual queries to each shard to find out how to sort the
sub-results into one final result, and then sends a second query to
relevant shards to request the individual documents it needs for the
result.  The total number of individual queries goes up significantly.

Before Solr was doing one query for one result.  Now it is doing between
five and nine queries for one result (the initial query from your client
to the coordinating node, the first query to each of the four shards,
and then a possible second query to each shard).  If the number of
search hits is more than zero, it will be at least six queries.  This is
why one shard is preferred for high query rates if you can fit the whole
index into one shard.  Five gigabytes is a pretty small Solr index.

Sharding is most effective when the query rate is low, because Solr can
take advantage of idle CPUs.  It makes it possible to have a much larger
index.  A high query rate means that there are no idle CPUs.

Thanks,
Shawn

Reply via email to