Re: How to fix unstable solr cluster

Sarthak Sharma Thu, 16 May 2024 12:02:03 -0700

A correction : SOLR version is 4.8.1 actually. Sorry for the miss.


On Tue, May 14, 2024 at 10:55 AM Sarthak Sharma <[email protected]>
wrote:

> Hi Team,
>
> Gentle reminder and request to help with this issue as we are kind of
> stuck. Solr admin ui is also not opening up for us to do further debugging
> and throws this error:
>
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /overseer/collection-queue-work/qn- at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at
> org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at
> org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:243)
> at
> org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:240)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
> at org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:240)
> at
> org.apache.solr.cloud.DistributedQueue.createData(DistributedQueue.java:311)
> at org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:330)
> at
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:344)
> at
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:309)
> at
> org.apache.solr.handler.admin.CollectionsHandler.handleClusterStatus(CollectionsHandler.java:628
>
> Our preference is to bring the cluster back to stable state with low
> downtime and no data loss.
> or At least admin ui starts opening up. Please suggest.
>
> Thanks
>
> On Sun, May 12, 2024 at 9:40 PM Sarthak Sharma <[email protected]>
> wrote:
>
>> Hi,
>>
>> We have a production solr cluster setup with 4 shards and 4 replicas on a
>> legacy stack.
>> Same machines have been used to host 5 Zookeeper nodes ensemble.
>>
>> Solr version : 4.8.1
>> ZK version : 3.4.6
>>
>> Few days back, one of the solr processes was stuck because of which reads
>> and write were failing. We did a few rounds of ZK/Solr restarts after
>> clearing up disk space and read operations started working fine. but , the
>> write operations (indexing) started failing with below error :
>>
>> Caused by:
>> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>> ClusterState says we are the leader (<host>:<port>/solr/<collection-name>),
>> but locally we don't think so. Request came from null
>>
>> We checked the cluster state using below steps. (Solr UI is not
>> accessible for some reasons)
>>
>>
>>    1. Go to the path "zookeeper/installation/directory/bin"
>>    2. ./zkCli.sh -server localhost:1234
>>    3. get /clusterstate.json
>>
>>
>> We see that 3 shard replica nodes are STUCK in 'recovering' state.
>>
>> The cluster is of critical importance and We want to use minimum possible
>> and safe changes to bring cluster back to stable state. Upgrading versions
>> is not possible either.
>>
>>
>> Please help us understand this behavior and way out of it.
>> Fixing this issue is really critical and urgent and we don't have enough
>> Solr expertise in the team.
>> This cluster was mainly in maintenance mode and in a deprecation path,
>> hence the situation.
>>
>> Is there a way to force replication to unstable node from stable node?
>> Please let me know of your thoughts. Really appreciate any help.
>>
>> Thanks,
>> Sarthak
>>
>

Re: How to fix unstable solr cluster

Reply via email to