Re: Reconfig without quorum

Jürgen Wagner (DVT) Wed, 17 Sep 2014 23:25:40 -0700

Hello Martin,
  Zookeeper is not really designed to survive partitioning between two
isolated data centers, but rather the loss of nodes up to half of the
ensemble. Therefore, there can be several approaches to your issue,
depending on the usage patterns of Zookeeper in your applications.

What works for me with SolrCloud and our search platform based on this:
run Zookeeper in virtual machines in one location and only observer
nodes in the other. This way, the network may be flaky, but in
combination with read-only mode for Zookeeper client connections, this
provides redundancy and an easy way to handle ensemble changes.  If your
applications follow the same pattern, the updates will mostly occur
where the ensemble is located, while the other Zookeepers in the
redundant location will only "listen". In the case of a failure of the
main site, simply move the virtual machines of the ensemble over to the
other location and switch roles. This would probably also mirror what
you do with the applications. With SolrCloud, you would partition your
nodes into two nodesets, one for each data center. Make sure that one
set of replica for each shard is located in one data center, the other
in the other.

If redundant and reliable feeding really is important (e.g., because the
data is not persistent and cannot be reproduced), you will probably be
better off with two independent SolrCloud instances, one per data
center, and some reliable message delivery in front for feeding.

Finally, consider also management procedures in handling a virtualized
environment with the SolrCloud and Zookeeper nodes. If you employ a
cloud management platform handling service migration between sites, this
may be an even easier solution. Dead Zookeepers and SolrCloud nodes
would automagically pop up, resurrected in the surviving location in
case one data center should fail.

A really final note: as we face this issue in customer scenarios as
well, we are also looking into not using Zookeeper for this purpose, but
rather Cassandra instances. This leads to a somewhat different
interaction model between Solr instances, but may be better suited esp.
for the partitioning case. Bad news: yes, we're on our own with this. No
standard support from Solr for Cassandra yet.

So, there are several approaches how this could be handled. Which one is
the best for you is left to decide on the precise topology requirements
and platform capabilities. Unfortunately (or luckily for consulting
companies in the field :-), there is not a single, easy approach that
works for all.

Best regards,
--Jürgen

On Wed, Sep 17, 2014 at 1:19 PM, Martin Grotzke <
[email protected]> wrote:
>> Hi,
>>
>> is it true, that the reconfig command that's available since 3.5.0 can only
>> be used if there's a quorum?
>>
>> Our situation is that we have 2 datacenters (actually only 2 zones within
>> the same DC) which will be provisioned equally, so that we'll have an even
>> number of ZK nodes (true, not optimal). When 1 zone fails, there won't be a
>> quorum any more and ZK will be unavailable - that's my understanding. Is it
>> possible to add new nodes to the ZK cluster and achieve a quorum again
>> while the failed zone is still unavailable?
>>
>> What would you recommend how to handle this situation?
>>
>> We're using (going to use) SolrCloud as clients.
>>
>> Thanks && cheers,
>> Martin
>>

-- 

Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
уважением
*i.A. Jürgen Wagner*
Head of Competence Center "Intelligence"
& Senior Cloud Consultant

Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
E-Mail: [email protected]
<mailto:[email protected]>, URL: www.devoteam.de
<http://www.devoteam.de/>

------------------------------------------------------------------------
Managing Board: Jürgen Hatzipantelis (CEO)
Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071

Re: Reconfig without quorum

Reply via email to