I would have thought it would be a huge hassle, but it isn’t. We build a cluster with ArgoCD and Kubernetes deployments. The nodes all report to DataDog. The only real bother is that it is hard to go directly to the admin UI on a specific node, something doesn’t work right with the hostnames and permissions from outside.
Oh, and we run blue/green clusters, so there are two of these beasts. The graphical display in the admin UI is pretty impressive. Here is a view of part of it. https://www.dropbox.com/scl/fi/99xfgek24qocowhft6q7b/Screenshot-2023-09-14-at-7.27.16-AM.png?rlkey=nmjyrl9z0n92lgidfei45vq4q&dl=0 The collection currently has about 2.5 billion documents. When I worked at Infoseek, our index of the entire web was 12 million documents. This is at LexisNexis. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 13, 2023, at 11:47 PM, Bernd Fehling <bernd.fehl...@uni-bielefeld.de> > wrote: > > @Walter, > > how on earth are you monitoring all vital Solr Cloud Parameters for 320 > shards? > > Regards, > Bernd > > > Am 13.09.23 um 16:22 schrieb Walter Underwood: >> This is all great advice. >> There is no optimal number of shards. I’ve run clusters with 4 shards, we >> currently have one cluster with 96 shards and one with 320 shards. The next >> one we build out will probably not be sharded. >> With long queries, I’ve usually seen a roughly linear speedup with sharding. >> Double the shards, halve the response time. >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >>> On Sep 13, 2023, at 4:48 AM, Jan Høydahl <jan....@cominvent.com> wrote: >>> >>> Hi, >>> >>> There are no hard rules wrt sharding, it often comes down to measuring and >>> experimenting for your workload. >>> >>> There are other things to consider than shard size. Why are the queries >>> slow? How many rows do you ask for? Do you use faceting? Grouping? >>> You have 25Gb of data on each of the 8 nodes/shards. Now, how much RAM does >>> each node have, and how much RAM did you allocate to Solr/Java? >>> A common mistake is to allocate too much ram/heap to Solr to you don't get >>> any virtual memory caching in Linux. >>> Say you have 32Gb of physical RAM on the nodes. Then do not give 30 of >>> those to Solr. Instead give 8Gb to Solr and let 24Gb be available for disk >>> caching. >>> >>> Other things to consider is to look at whether your queries can be >>> optimized by rewriting them to more efficient equivalents. Sometimes, >>> Solr-level caches can also help. >>> >>> Wrt shards efficiency: If you already have 8 shards, it is not much more >>> expensive to go to 16, but you increase the risk of a single failure >>> affecting your requests... >>> >>> Jan >>> >>>> 13. sep. 2023 kl. 10:32 skrev Saksham Gupta >>>> <saksham.gu...@indiamart.com.INVALID>: >>>> >>>> Hi All, >>>> >>>> I have been trying to reduce the response time of solr cloud(v8.10, 8 >>>> nodes). To achieve this, I have tried increasing the number of shards of >>>> solr cloud which can help reduce data size on each shard thereby reducing >>>> response time. >>>> >>>> >>>> I have encountered a few questions regarding sharding strategy: >>>> >>>> 1. How to decide the ideal number of shards? Is there a minimum or maximum >>>> number of shards which should be used? >>>> >>>> 2. What is the minimum size of a shard after which reducing the size >>>> further won't have any effect on the response time (as time taken by other >>>> factors like data aggregation will compensate for that) ? >>>> >>>> 3. Is there some maximum limit to the size of data that should be kept in a >>>> shard? >>>> >>>> >>>> As of now we have 8 shards each on a separate node with ~25 gb of >>>> data(15-16 million docs) present on each shard. Please advise me of the >>>> standard approaches to define the number of shards and shard size. Thanks >>>> in advance. >>> > > -- > ************************************************************* > Bernd Fehling Bielefeld University Library > Dipl.-Inform. (FH) LibTec - Library Technology > Universitätsstr. 25 and Knowledge Management > 33615 Bielefeld > Tel. +49 521 106-4060 bernd.fehling(at)uni-bielefeld.de > https://www.ub.uni-bielefeld.de/~befehl/ > > BASE - Bielefeld Academic Search Engine - www.base-search.net > *************************************************************