Re: Is it safe to reuse zookeeper replica ID when reprovisioning?

Alexander Shraer Mon, 01 Apr 2019 17:59:17 -0700

Just wanted to add that it's not important to wait until the replaced node
has fully synced - what's important is to wait until a quorum that doesn't
include it has the latest data before starting the replacement process
(which is like manually loosing data).


So, you could logically remove it (this makes it non-voting, and makes sure
that a quorum that doesn't include it is up-to-date). Then you can
immediately add it back, even if it isn't fully synced yet. This is
probably also better to do in case you do have a failure - if C fails never
recovers but A has the latest data and B is a voter then B can recover from
A and they can continue normally.




On Mon, Apr 1, 2019 at 5:46 PM Alexander Shraer <[email protected]> wrote:

> Lets say you have nodes A, B, C. Only B and C have latest data. You're
> trying to replace B.
> You replace B with a new server but before its in sync, C fails. What
> happens ?
>
> Option 1 (no reconfiguration): A and B are both registered as voting
> members, they form a majority out of 3, B syncs from A and they happily
> continue together. Since neither have the latest data, this is data loss.
> Option 2 (with reconfiguration): By logically removing B first, you're
> bringing A up to date. So A and C both have the latest data now. A is going
> to be stalled while C is down and will not form a quorum with B, since B
> isn't registered to be able to vote. If C never recovers, you can recover
> manually by updating config files.
>
>
> On Mon, Apr 1, 2019 at 5:10 PM David Anderson <[email protected]> wrote:
>
>> On Mon, Apr 1, 2019 at 4:48 PM Alexander Shraer <[email protected]>
>> wrote:
>>
>> > Hi,
>> >
>> > I think that one of the problems with the proposed method is that you
>> may
>> > end-up having a majority of servers that don't have the latest state
>> > (imagine that there is a minority failure while your replaced
>> > node hasn't been brought up do date yet).
>>
>>
>> > Have you considered using dynamic reconfiguration ? Removing the nodes
>> > logically first, then replacing them and adding back in ? You can do
>> > multiple servers at a time this way.
>>
>>
>> Does dynamic reconfiguration as you suggest here buy me anything in a
>> 3-node cluster? No matter what I'm going to be at N+0 during the
>> transition, so doesn't it just add more steps for the same result?
>
>
>> Or, you can give new servers higher ids, add them using reconfig, and
>> later
>> > remove the old servers. Reconfiguration ensures that a quorum always has
>> > the data.
>> >
>>
>> My admittedly terrible motivation for avoiding that is that I want to
>> preserve hostnames, to avoid reconfiguring clients. This is in a cloud
>> environment where DNS is tied to instance name, so I can't play tricks at
>> the network layer - at some point I have to delete the old instances and
>> set up new ones with the same name. I suppose I could do a careful dance
>> where I grow to 5 nodes, then do a rolling removal/readd of the first 3,
>> so
>> that I can stay at N+1 during the replacement, and just trust that clients
>> can reach at least one of the first 3 replicas to discover the entire
>> cluster.
>>
>> - Dave
>>
>>
>> > Alex
>> >
>> >
>> >
>> > On Mon, Apr 1, 2019 at 2:51 PM David Anderson <[email protected]> wrote:
>> >
>> > > Hi,
>> > >
>> > > I have a running Zookeeper (3.5) cluster where the machines need to be
>> > > replaced. I was thinking of just setting the same ID on each new
>> > > machine, and then doing a rolling replacement: take down old ID 1,
>> > > start new ID 1, let it rejoin the cluster and replicate the state,
>> > > then continue with the other replicas.
>> > >
>> > > I'm finding conflicting information on the internet about the safety
>> > > of this. The Apache Kafka FAQ says to do exactly this when replacing a
>> > > failed Zookeeper replica, and the new machine will just replicate the
>> > > state before participating in the quorum. Other places on the internet
>> > > say that reusing the ID without also copying over the state directory
>> > > will break assumptions that ZAB makes about replicas, with bad (but
>> > > nondescript) consequences.
>> > >
>> > > So, is it safe to reuse IDs in the way I described? If not, what's the
>> > > suggested procedure for a rolling replacement of all cluster replicas?
>> > >
>> > > Thanks,
>> > > - Dave
>> > >
>> >
>>
>

Re: Is it safe to reuse zookeeper replica ID when reprovisioning?

Reply via email to