On 9/14/2013 6:57 AM, Prasi S wrote:
> I use SolrPingResponse.getStatus method to start indexing to solr. I use
> SolrCloud with external zookeeper
> 
> If i send it to the Zookeeper, if zookeeper is down, it returns NOTOK.
> 
> But if one of my solr is up and second solr is down, the Ping returns OK
> status.

If your zookeeper is completely down or does not have quorum, then
SolrCloud isn't going to work right, so a ping response of NOTOK is correct.

A fully redundant zookeeper ensemble is at least three machines,
preferably an odd number.  You can run zookeeper on the same hardware as
Solr, but it is recommended that it be a standalone process.  You should
not run the solr embedded zookeeper (-DzkRun) for production, because
when you shutdown or restart Solr, the embedded zookeeper also goes down.

With three machines in the zookeeper ensemble, you can have one of them
go down and everything keeps working perfectly.

If you want to know why an odd number is recommended, consider a
scenario with four zookeepers instead of three.  In order to have
quorum, you need to have half the servers plus one available.  On a
four-server ensemble, that works out so that three of them have to be
running.  You are no better off than if you have three servers, because
in either scenario you can only have one failure.  On top of that, you
have an extra possible point of failure and you're using more resources,
like switchports and power.  With five servers, two can go down and
quorum will be maintained.

If you only have two zookeepers, they both must be operational in order
to have quorum.  If one of them were to fail, quorum would be lost and
SolrCloud would stop working correctly.

SolrCloud itself is also designed to deal with a failure of a single
machine.  A replicationFactor of at least two is required for that to
work correctly.

Thanks,
Shawn

Reply via email to