Re: Optimal Sharding Strategy for Solr Cloud v8.10

Jan Høydahl Wed, 13 Sep 2023 08:33:53 -0700

Yes, if your average query touches just too many documents (such as huge OR 
queries) and has some processing that needs to touch each hit (scoring, result 
transformation, highlighting mm), then simply splitting the elephant with 
shards may help. Or if you ask for 100 facets and your facets are slow, you 
could perhaps use facet.threads to speed up that part. Or if you use grouping 
you could try collapse. Etc etc. We need to know more about your data, queries 
and use case to answer what the cure might be.


Jan

> 13. sep. 2023 kl. 16:22 skrev Walter Underwood <wun...@wunderwood.org>:
> 
> This is all great advice.
> 
> There is no optimal number of shards. I’ve run clusters with 4 shards, we 
> currently have one cluster with 96 shards and one with 320 shards. The next 
> one we build out will probably not be sharded.
> 
> With long queries, I’ve usually seen a roughly linear speedup with sharding. 
> Double the shards, halve the response time.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Sep 13, 2023, at 4:48 AM, Jan Høydahl <jan....@cominvent.com> wrote:
>> 
>> Hi,
>> 
>> There are no hard rules wrt sharding, it often comes down to measuring and 
>> experimenting for your workload.
>> 
>> There are other things to consider than shard size. Why are the queries 
>> slow? How many rows do you ask for? Do you use faceting? Grouping?
>> You have 25Gb of data on each of the 8 nodes/shards. Now, how much RAM does 
>> each node have, and how much RAM did you allocate to Solr/Java?
>> A common mistake is to allocate too much ram/heap to Solr to you don't get 
>> any virtual memory caching in Linux.
>> Say you have 32Gb of physical RAM on the nodes. Then do not give 30 of those 
>> to Solr. Instead give 8Gb to Solr and let 24Gb be available for disk caching.
>> 
>> Other things to consider is to look at whether your queries can be optimized 
>> by rewriting them to more efficient equivalents. Sometimes, Solr-level 
>> caches can also help.
>> 
>> Wrt shards efficiency: If you already have 8 shards, it is not much more 
>> expensive to go to 16, but you increase the risk of a single failure 
>> affecting your requests...
>> 
>> Jan
>> 
>>> 13. sep. 2023 kl. 10:32 skrev Saksham Gupta 
>>> <saksham.gu...@indiamart.com.INVALID>:
>>> 
>>> Hi All,
>>> 
>>> I have been trying to reduce the response time of solr cloud(v8.10, 8
>>> nodes). To achieve this, I have tried increasing the number of shards of
>>> solr cloud which can help reduce data size on each shard thereby reducing
>>> response time.
>>> 
>>> 
>>> I have encountered a few questions regarding sharding strategy:
>>> 
>>> 1. How to decide the ideal number of shards? Is there a minimum or maximum
>>> number of shards which should be used?
>>> 
>>> 2. What is the minimum size of a shard after which reducing the size
>>> further won't have any effect on the response time (as time taken by other
>>> factors like data aggregation will compensate for that) ?
>>> 
>>> 3. Is there some maximum limit to the size of data that should be kept in a
>>> shard?
>>> 
>>> 
>>> As of now we have 8 shards each on a separate node with ~25 gb of
>>> data(15-16 million docs) present on each shard. Please advise me of the
>>> standard approaches to define the number of shards and shard size. Thanks
>>> in advance.
>> 
>

Re: Optimal Sharding Strategy for Solr Cloud v8.10

Reply via email to