Good questions ... From my understanding, queries will work if Zk goes down
but writes do not work w/o Zookeeper. This works because the clusterstate
is cached on each node so Zookeeper doesn't participate directly in queries
and indexing requests. Solr has to decide not to allow writes if it loses
its connection to Zookeeper, which is a safe guard mechanism. In other
words, Solr assumes it's pretty safe to allow reads if the cluster doesn't
have a healthy coordinator, but chooses to not allow writes to be safe.

If a Solr nodes goes down while ZK is not available, since Solr no longer
accepts writes, leader / replica doesn't really matter. I'd venture to
guess there is some failover logic built in when executing distributing
queries but I'm not as familiar with that part of the code (I'll brush up
on it though as I'm now curious as well).

Cheers,
Tim


On Tue, Nov 19, 2013 at 11:58 AM, Garth Grimm <
garthgr...@averyranchconsulting.com> wrote:

> Given a 4 solr node instance (i.e. 2 shards, 2 replicas per shard), and a
> standalone zookeeper.
>
> Correct me if any of my understanding is incorrect on the following:
> If ZK goes down, most normal operations will still function, since my
> understanding is that ZK isn't involved on a transaction by transaction
> basis for each of these.....
> Document adds, updates, and deletes on existing collection will still work
> as expected.
> Queries will still get processed as expected.
> Is the above correct?
>
> But adding new collections, changing configs, etc., will all fail while ZK
> is down (or at least, place things in an inconsistent state?)
> Is that correct?
>
> If, while ZK is down, one of the 4 solr nodes also goes down, will all
> normal operations fail?  Will they all continue to succeed?  I.e. will each
> of the nodes realize which node is down and route indexing and query
> requests around them, or is that impossible while ZK is down?  Will some
> queries succeed (because they were lucky enough to get routed to the one
> replica on the one shard that is still functional) while other queries fail
> (they aren't so lucky and get routed to the one replica that is down on the
> one shard)?
>
> Thanks,
> Garth Grimm
>
>
>

Reply via email to