Re: Down Replica is elected as Leader (solr v8.7.0)

2021-02-16 Thread mmb1234
> Are yours growing always, on all nodes, forever? Or is it one or two who ends up in a bad state? Randomly on some of the shards and some of the followers in the collection. Then whichever tlog was open on follower when it was the leader, that one doesn't stops growing. And that shard had active

Re: Down Replica is elected as Leader (solr v8.7.0)

2021-02-16 Thread matthew sporleder
I've run into this (or similar) issues in the past (solr6? I don't remember exactly) where tlogs get stuck either growing indefinitely and/or refusing to commit on restart. What I ended up doing was writing a monitor to check for the number of tlogs and alert if they got over some limit (100 or wh

Re: Down Replica is elected as Leader (solr v8.7.0)

2021-02-16 Thread mmb1234
Looks like the problem is related to tlog rotation on the follower shard. We did the following for a specific shard. 0. start solr cloud 1. solr-0 (leader), solr-1, solr-2 2. rebalance to make solr-1 as preferred leader 3. solr-0, solr-1 (leader), solr-2 The tlog file on solr-0 kept on growing i

Re: Down Replica is elected as Leader (solr v8.7.0)

2021-02-16 Thread mmb1234
Looks like the problem is related to tlog rotation on the follower shard. We did the following for a specific shard. 0. start solr cloud 1. solr-0 (leader), solr-1, solr-2 2. rebalance to make solr-1 as preferred leader 3. solr-0, solr-1 (leader), solr-2 The tlog file on solr-0 kept on growing i

Re: Down Replica is elected as Leader (solr v8.7.0)

2021-02-14 Thread mmb1234
We found that for the shard that does not get a leader, the tlog replay did not complete (we don't see "log replay finished", "creating leader registration node", "I am the new leader" etc log messages) for hours. Also not sure why the TLOG are 10's of GBs (anywhere from 30 to 40GB). Collectio

Re: Down Replica is elected as Leader (solr v8.7.0)

2021-02-13 Thread mmb1234
By tracing the output in the log files we see the following sequence. Overseer role list has POD-1, POD-2, POD-3 in that order POD-3 has 2 shard leaders. POD-3 restarts. A) Logs for the shard whose leader moves successfully from POD-3 to POD-1 On POD-1: o.a.s.c.ShardLeaderElectionContext Replay

Re: Down Replica is elected as Leader (solr v8.7.0)

2021-02-11 Thread Rahul Goswami
I haven’t delved into the exact reason for this, but what generally helps to avoid this situation in a cluster is i) During shutdown (in case you need to restart the cluster), let the overseer node be the last one to shut down. ii) While restarting, let the Overseer node be the first one to start i

Down Replica is elected as Leader (solr v8.7.0)

2021-02-10 Thread mmb1234
Hello, On reboot of one of the solr nodes in the cluster, we often see a collection's shards with 1. LEADER replica in DOWN state, and/or 2. shard with no LEADER Output from /solr/admin/collections?action=CLUSTERSTATUS is below. Even after 5 to 10 minutes, the collection often does not recover.