Re: Improve the process of removing bookies from a cluster

Ivan Kelly Mon, 06 Sep 2021 04:45:14 -0700

Hi Yang,

This is something we've been thinking about internally. It's
especially important if we want to implement auto scaling for bookies.


I'm not sure we need a "draining" state as such. Or at least, the
draining state doesn't need to be at the same level as "read-only".
"draining" is only interesting to the entity moving data off of the
bookie, so the auditor could keep a record that it is draining bookie
X,
but the cluster as a whole doesn't need to care about it.

>From a logical point of view, decommissioning/scale down should be a matter of
1. Mark bookie as read only so that no new data is added to it
2. Wait for the bookie to hold no live data

The most important thing, and a thing that we have really missed in
the past is the ability to mark a running bookie as read-only.
This should be trivial to implement, though the split between
bookie-shell and bk-cli is a bit of a mess right now. I think there is
a
REST api endpoint, but it is non-persistent.

Once the bookie is read-only, we have the following options for
getting live data off of the bookie.
1. Wait for pulsar retention period to pass (currently available, but
can take a long time).
2. Use tiered storage to move older data off. This is currently
implemented as a pulsar feature, but I think it would make sense to
move it down to the bookie layer.
3. Use auditor/auto recovery to move the data.

Personally I'm not a fan of the auditor/auto recovery stuff currently
in bookkeeper. Any time we've relied on it in the past it has
blown up on us or moved too slowly to be very useful. Part of the
problem is that it conflates data integrity checking, with bookie
decommissioning. Data integrity is concerned with ensuring a bookie
has the data zookeeper says it has.
Decommissioning is moving data off of a bookie. One should be
naturally cheap, the other expensive. With autorecovery, they both
end up expensive.

A problem with both tiered storage and autorecovery for
decommissioning, is that they need to move data and so induce load in
the
system. However, this load isn't well quantified, so they use manually
set rate limiting, which doesn't respond to the rest of the load
in the system. The first thing  we need to do, and we are actively
working on this, is to generate accurate utilization and saturation
metrics for bookies. Once we have these metrics, the entity copying
the data can do so much faster without impacting production
traffic.

Cheers,
Ivan

Re: Improve the process of removing bookies from a cluster

Reply via email to