Hi Shawn,

Thank you for your explanations and advices.

I agree with the bad terminology I used between shard and replica.

I'm trying to understand and fix intermittent spikes in heap memory usage
and threads count.
When the issue occurs Heap dump shows a lot of threads building the
filtercache. Each thread consuming a lot of heap memory.
In consequence, even if the heap is not full, there are some consecutive
full GC.

In the solr logs file, I also see Cdcr related messages and I see
concurrent searchers opening in a few seconds on the same replica.

I don't understand these searcher opening in the same second as
autosoftcommit maxtime si 60000 and autocommit maxtime is 300000 with
opensearcher set to false.


solr.log.18:394492:2022-11-23 14:45:37.134 INFO  (zkCallback-5-thread-128)
[   ] o.a.s.h.CdcrLeaderStateManager Received new leader state @ SSSS:shard1
solr.log.18:418525:2022-11-23 14:45:48.533 INFO  (Thread-2218) [c:SSSS
s:shard1 r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.s.SolrIndexSearcher
Opening [Searcher@39764e3a[SSSS_shard1_replica_t89] main]
solr.log.18:418626:2022-11-23 14:45:49.535 INFO  (Thread-2218) [c:SSSS
s:shard1 r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.s.SolrIndexSearcher
Opening [Searcher@3dd70403[SSSS_shard1_replica_t89] main]
solr.log.18:418667:2022-11-23 14:45:50.090 INFO
 (recoveryExecutor-4-thread-13-processing-n:no2fyy27.noe.edf.fr:8984_solr
x:SSSS_shard1_replica_t89 c:SSSS s:shard1 r:core_node90) [c:SSSS s:shard1
r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.s.SolrIndexSearcher Opening
[Searcher@4a51a759[SSSS_shard1_replica_t89] main]
solr.log.18:418682:2022-11-23 14:45:50.153 INFO
 (recoveryExecutor-4-thread-13-processing-n:no2fyy27.noe.edf.fr:8984_solr
x:SSSS_shard1_replica_t89 c:SSSS s:shard1 r:core_node90) [c:SSSS s:shard1
r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.h.CdcrRequestHandler Solr
core is being closed - shutting down CDCR handler @ SSSS:shard1


or

solr.log.18:454048:2022-11-23 14:59:21.351 INFO  (Thread-2668) [c:SSSS
s:shard1 r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.s.SolrIndexSearcher
Opening [Searcher@273ebecf[SSSS_shard1_replica_t89] main]
solr.log.18:454403:2022-11-23 14:59:21.993 INFO  (Thread-2668) [c:SSSS
s:shard1 r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.s.SolrIndexSearcher
Opening [Searcher@711992c5[SSSS_shard1_replica_t89] main]
solr.log.18:454484:2022-11-23 14:59:22.588 INFO
 (recoveryExecutor-4-thread-17-processing-n:no2fyy27.noe.edf.fr:8984_solr
x:SSSS_shard1_replica_t89 c:SSSS s:shard1 r:core_node90) [c:SSSS s:shard1
r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.s.SolrIndexSearcher Opening
[Searcher@5258502d[SSSS_shard1_replica_t89] main]
solr.log.18:454502:2022-11-23 14:59:22.609 INFO
 (recoveryExecutor-4-thread-17-processing-n:no2fyy27.noe.edf.fr:8984_solr
x:SSSS_shard1_replica_t89 c:SSSS s:shard1 r:core_node90) [c:SSSS s:shard1
r:core_node90 x:SSSS_shard1_replica_t89] o.a.s.h.CdcrRequestHandler Solr
core is being closed - shutting down CDCR handler @ SSSS:shard1



And finally, yes there are two solr instances per server.
The architecture is
* 7 servers (96GB RAM / 12 CPU)
* 14 solr instances (24GB Heap)

Huge collection sharding : 14 shards x 2 replica all TLOG
Total number of documents : 1.5 billions (100 millions per shard)

Regards

Dominique

Le ven. 16 déc. 2022 à 09:43, Shawn Heisey <[email protected]> a écrit :

> On 12/15/22 12:43, Dominique Bejean wrote:
> > I have a sharded collection distributed over several solr nodes. Each
> solr
> > node hosts one shard and one replica of another shard. shards are huge
> (100
> > millions documents). Queries are using several filterQuery. filterCache
> for
> > this number of documents can use high amount of heap memory.
>
> Terminology nit:  Each node is hosting two replicas of different shards.
>   It's not "a shard and a replica" ... it's "two replicas of shard N".
> One replica becomes leader, but that is a mutable temporary distinction.
>   Any NRT or TLOG replica can become leader.
>
> > Is it a good idea to split shards by 2 or 4 in order to have shards with
> 50
> > or 25 millions documents ?
> > With a split by 4, a Solr node will host 8 replicas instead of 2, but
> with
> > smaller filterCache for each replica.
>
> The total amount of heap memory required for the filterCache will not be
> reduced by this.  And there are other per-core memory structures that
> would need more total heap memory, not less.
>
> If you have a LOT of CPU capacity and a fairly low query rate, more
> shards per Solr instance can yield a net increase in query performance.
> But if the query rate is high or the CPU core count is low, you're
> probably better off with fewer shards.
>
> > I don't expect to have better search performances, but I expect to have
> > faster warming and mainly less impacted heap memory by open searcher
> during
> > sofcommit.  For instance, instead of having one large filterCache warming
> > up once each minute, 4 smaller filterCaches will warm up not at the same
> > time (hopefully).
>
> Multiple smaller caches at the same time might warm a bit faster if
> system capacities are sufficient.  But you seem to be under the
> impression that running eight smaller indexes instead of two larger
> indexes will drop your heap requirement.  It won't.  It would actually
> increase it, and due to efficiencies in the way Lucene builds each
> index, it would also increase your disk space requirements.  I don't
> have any way of knowing how much these things would increase.
>
> Multiple threads/processes is the best way to increase Solr indexing
> performance.  Memory is the best way to increase general Solr
> performance -- lots of extra memory for disk caching, especially with
> really big indexes.  Which might mean greatly increasing the server
> count and also increasing the shard count, so each server's indexes are
> smaller and fit into the disk cache better.
>
> At least you haven't mentioned running multiple Solr instances per
> machine.  That only makes sense for REALLY large installs.  I'm sorry to
> tell you that while yours is not small, we've heard from people who have
> billions of documents and terabytes of index data when counting only one
> replica of each shard.  For now Solr sees better memory efficiency by
> keeping the heap below 32GB, so for really large systems, two Solr
> instances each with a 31GB heap actually has MORE memory available for
> Solr than one instance with a 64GB heap.
>
> Thanks,
> Shawn
>

Reply via email to