Re: Optimal Sharding Strategy for Solr Cloud v8.10

Walter Underwood Thu, 14 Sep 2023 07:36:56 -0700

I would have thought it would be a huge hassle, but it isn’t. We build a 
cluster with ArgoCD and Kubernetes deployments. The nodes all report to 
DataDog. The only real bother is that it is hard to go directly to the admin UI 
on a specific node, something doesn’t work right with the hostnames and 
permissions from outside.


Oh, and we run blue/green clusters, so there are two of these beasts.

The graphical display in the admin UI is pretty impressive. Here is a view of 
part of it.

https://www.dropbox.com/scl/fi/99xfgek24qocowhft6q7b/Screenshot-2023-09-14-at-7.27.16-AM.png?rlkey=nmjyrl9z0n92lgidfei45vq4q&dl=0

The collection currently has about 2.5 billion documents. When I worked at 
Infoseek, our index of the entire web was 12 million documents.

This is at LexisNexis.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 13, 2023, at 11:47 PM, Bernd Fehling <bernd.fehl...@uni-bielefeld.de> 
> wrote:
> 
> @Walter,
> 
> how on earth are you monitoring all vital Solr Cloud Parameters for 320 
> shards?
> 
> Regards,
> Bernd
> 
> 
> Am 13.09.23 um 16:22 schrieb Walter Underwood:
>> This is all great advice.
>> There is no optimal number of shards. I’ve run clusters with 4 shards, we 
>> currently have one cluster with 96 shards and one with 320 shards. The next 
>> one we build out will probably not be sharded.
>> With long queries, I’ve usually seen a roughly linear speedup with sharding. 
>> Double the shards, halve the response time.
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>> On Sep 13, 2023, at 4:48 AM, Jan Høydahl <jan....@cominvent.com> wrote:
>>> 
>>> Hi,
>>> 
>>> There are no hard rules wrt sharding, it often comes down to measuring and 
>>> experimenting for your workload.
>>> 
>>> There are other things to consider than shard size. Why are the queries 
>>> slow? How many rows do you ask for? Do you use faceting? Grouping?
>>> You have 25Gb of data on each of the 8 nodes/shards. Now, how much RAM does 
>>> each node have, and how much RAM did you allocate to Solr/Java?
>>> A common mistake is to allocate too much ram/heap to Solr to you don't get 
>>> any virtual memory caching in Linux.
>>> Say you have 32Gb of physical RAM on the nodes. Then do not give 30 of 
>>> those to Solr. Instead give 8Gb to Solr and let 24Gb be available for disk 
>>> caching.
>>> 
>>> Other things to consider is to look at whether your queries can be 
>>> optimized by rewriting them to more efficient equivalents. Sometimes, 
>>> Solr-level caches can also help.
>>> 
>>> Wrt shards efficiency: If you already have 8 shards, it is not much more 
>>> expensive to go to 16, but you increase the risk of a single failure 
>>> affecting your requests...
>>> 
>>> Jan
>>> 
>>>> 13. sep. 2023 kl. 10:32 skrev Saksham Gupta 
>>>> <saksham.gu...@indiamart.com.INVALID>:
>>>> 
>>>> Hi All,
>>>> 
>>>> I have been trying to reduce the response time of solr cloud(v8.10, 8
>>>> nodes). To achieve this, I have tried increasing the number of shards of
>>>> solr cloud which can help reduce data size on each shard thereby reducing
>>>> response time.
>>>> 
>>>> 
>>>> I have encountered a few questions regarding sharding strategy:
>>>> 
>>>> 1. How to decide the ideal number of shards? Is there a minimum or maximum
>>>> number of shards which should be used?
>>>> 
>>>> 2. What is the minimum size of a shard after which reducing the size
>>>> further won't have any effect on the response time (as time taken by other
>>>> factors like data aggregation will compensate for that) ?
>>>> 
>>>> 3. Is there some maximum limit to the size of data that should be kept in a
>>>> shard?
>>>> 
>>>> 
>>>> As of now we have 8 shards each on a separate node with ~25 gb of
>>>> data(15-16 million docs) present on each shard. Please advise me of the
>>>> standard approaches to define the number of shards and shard size. Thanks
>>>> in advance.
>>> 
> 
> -- 
> *************************************************************
> Bernd Fehling                    Bielefeld University Library
> Dipl.-Inform. (FH)                LibTec - Library Technology
> Universitätsstr. 25                  and Knowledge Management
> 33615 Bielefeld
> Tel. +49 521 106-4060       bernd.fehling(at)uni-bielefeld.de
>          https://www.ub.uni-bielefeld.de/~befehl/
> 
> BASE - Bielefeld Academic Search Engine - www.base-search.net
> *************************************************************

Re: Optimal Sharding Strategy for Solr Cloud v8.10

Reply via email to