I want to add one more thing for Shawn about Zookeeper. In order to have quorum, you need to have half the servers plus one available. Because of that let's assume you have 4 machine of Zookeeper and two of them communicating within them and other two of them communicating within them. Assume that this two zookeeper sets (each of them has two zookeeper node) can not communicate with each other. This will result with a brain split. So the rule is simple. In order to have quorum, you need to have half the servers plus one available because there can not be two different sets at any time that has a number of half the servers plus one. There can be only one.
2013/9/15 Shawn Heisey <s...@elyograg.org> > On 9/14/2013 6:57 AM, Prasi S wrote: > > I use SolrPingResponse.getStatus method to start indexing to solr. I use > > SolrCloud with external zookeeper > > > > If i send it to the Zookeeper, if zookeeper is down, it returns NOTOK. > > > > But if one of my solr is up and second solr is down, the Ping returns OK > > status. > > If your zookeeper is completely down or does not have quorum, then > SolrCloud isn't going to work right, so a ping response of NOTOK is > correct. > > A fully redundant zookeeper ensemble is at least three machines, > preferably an odd number. You can run zookeeper on the same hardware as > Solr, but it is recommended that it be a standalone process. You should > not run the solr embedded zookeeper (-DzkRun) for production, because > when you shutdown or restart Solr, the embedded zookeeper also goes down. > > With three machines in the zookeeper ensemble, you can have one of them > go down and everything keeps working perfectly. > > If you want to know why an odd number is recommended, consider a > scenario with four zookeepers instead of three. In order to have > quorum, you need to have half the servers plus one available. On a > four-server ensemble, that works out so that three of them have to be > running. You are no better off than if you have three servers, because > in either scenario you can only have one failure. On top of that, you > have an extra possible point of failure and you're using more resources, > like switchports and power. With five servers, two can go down and > quorum will be maintained. > > If you only have two zookeepers, they both must be operational in order > to have quorum. If one of them were to fail, quorum would be lost and > SolrCloud would stop working correctly. > > SolrCloud itself is also designed to deal with a failure of a single > machine. A replicationFactor of at least two is required for that to > work correctly. > > Thanks, > Shawn > >