Hi Reej,

You can't change this behavior, this is how Zk works. You need to have at
least *(2*N + 1)* nodes in the zookeeper cluster if you want to
tolerate *N* zookeeper
node failures.

This is also well documented in Solr ref guide :-
https://solr.apache.org/guide/8_8/setting-up-an-external-zookeeper-ensemble.html
"For a ZooKeeper service to be active, there must be a majority of
non-failing machines that can communicate with each other. *To create a
deployment that can tolerate the failure of F machines, you should count on
deploying 2xF+1 machines*."

This restriction is by design in order to make zookeeper function properly.
If zookeeper can respond to requests with just one node alive, then the
whole cluster can't be guaranteed to be partition tolerant anymore in
all cases.

Having said that, if you still want zookeeper to work properly even if 2
nodes are down (somewhat unlike in most cases), then increase your
zookeeper cluster to have 5 nodes.

Thanks,
Vinay

On Tue, Jan 25, 2022 at 2:15 PM Reej Nayagam <[email protected]> wrote:

> Hi All,
>
> We are using solr 8.82 cloud setup with zk ensemble 3.6.3 (3 zk servers).
> We have a HA monitoring system which will ping every 5 mins to check if the
> solr URL or shards are available and ping is success. If there is a failure
> ( if solr servers are down) it=E2=80=99ll flag to switch the searching
> from=
> using
> solr to DB search.
>
> Now we have an issue we are connecting to solr passing the zk hosts. Now
> if 2 of the zookeepers are down and if we are calling the
> solrping.process(client), it’s processing for so
> long as all the zk are down and going to a stuck thread and slowing down
> the weblogic application server.
>
> This works fine when the majority of the
> zookeepers are up and running
>
> Method Snippet
>
> private int pingRepo(zkHostList, corename)
> {
> Solrclient client = SolrConnectionUtil.getSokrClient(zkHostList, “ping”)
> ((Cloudsolrclient)client).setdefaultCollection(corename)
> Solrping ping = new Solrping
> SolrPingResponse resp;
> Try{
> resp = ping.process(client):
> return resp.getQTime();
> }
> Catch (Exception e)
> {}
> return -1;
> }
>
> Can anyone suggest how to handle this, If the zookeepers are down. Thank
> you
>
> Regards
> Reej
>
> Sent from my iPhone

Reply via email to