On 9/14/2013 6:57 AM, Prasi S wrote: > I use SolrPingResponse.getStatus method to start indexing to solr. I use > SolrCloud with external zookeeper > > If i send it to the Zookeeper, if zookeeper is down, it returns NOTOK. > > But if one of my solr is up and second solr is down, the Ping returns OK > status.
If your zookeeper is completely down or does not have quorum, then SolrCloud isn't going to work right, so a ping response of NOTOK is correct. A fully redundant zookeeper ensemble is at least three machines, preferably an odd number. You can run zookeeper on the same hardware as Solr, but it is recommended that it be a standalone process. You should not run the solr embedded zookeeper (-DzkRun) for production, because when you shutdown or restart Solr, the embedded zookeeper also goes down. With three machines in the zookeeper ensemble, you can have one of them go down and everything keeps working perfectly. If you want to know why an odd number is recommended, consider a scenario with four zookeepers instead of three. In order to have quorum, you need to have half the servers plus one available. On a four-server ensemble, that works out so that three of them have to be running. You are no better off than if you have three servers, because in either scenario you can only have one failure. On top of that, you have an extra possible point of failure and you're using more resources, like switchports and power. With five servers, two can go down and quorum will be maintained. If you only have two zookeepers, they both must be operational in order to have quorum. If one of them were to fail, quorum would be lost and SolrCloud would stop working correctly. SolrCloud itself is also designed to deal with a failure of a single machine. A replicationFactor of at least two is required for that to work correctly. Thanks, Shawn