Hi Yang, This is something we've been thinking about internally. It's especially important if we want to implement auto scaling for bookies.
I'm not sure we need a "draining" state as such. Or at least, the draining state doesn't need to be at the same level as "read-only". "draining" is only interesting to the entity moving data off of the bookie, so the auditor could keep a record that it is draining bookie X, but the cluster as a whole doesn't need to care about it. >From a logical point of view, decommissioning/scale down should be a matter of 1. Mark bookie as read only so that no new data is added to it 2. Wait for the bookie to hold no live data The most important thing, and a thing that we have really missed in the past is the ability to mark a running bookie as read-only. This should be trivial to implement, though the split between bookie-shell and bk-cli is a bit of a mess right now. I think there is a REST api endpoint, but it is non-persistent. Once the bookie is read-only, we have the following options for getting live data off of the bookie. 1. Wait for pulsar retention period to pass (currently available, but can take a long time). 2. Use tiered storage to move older data off. This is currently implemented as a pulsar feature, but I think it would make sense to move it down to the bookie layer. 3. Use auditor/auto recovery to move the data. Personally I'm not a fan of the auditor/auto recovery stuff currently in bookkeeper. Any time we've relied on it in the past it has blown up on us or moved too slowly to be very useful. Part of the problem is that it conflates data integrity checking, with bookie decommissioning. Data integrity is concerned with ensuring a bookie has the data zookeeper says it has. Decommissioning is moving data off of a bookie. One should be naturally cheap, the other expensive. With autorecovery, they both end up expensive. A problem with both tiered storage and autorecovery for decommissioning, is that they need to move data and so induce load in the system. However, this load isn't well quantified, so they use manually set rate limiting, which doesn't respond to the rest of the load in the system. The first thing we need to do, and we are actively working on this, is to generate accurate utilization and saturation metrics for bookies. Once we have these metrics, the entity copying the data can do so much faster without impacting production traffic. Cheers, Ivan