Hello Gareth, There is a checkpoint file that records the corresponding offset of the changelog for the state store data co-located within the state directory; after the partition is migrated to new owners, this checkpoint file along with the state store would not be deleted immediately but follow a cleanup delay policy.
Guozhang On Sun, Apr 11, 2021 at 11:13 AM Gareth Collins <gareth.o.coll...@gmail.com> wrote: > Hi Guozheng, > > Thanks very much again for the answers! > > One follow-up on the first question. Just so I understand it, how would it > know where to continue from? > I would assume that once we repartition, the new node will own the position > in the consumer group for the relevant partition(s) > so Kafka/Zookeeper would not know the position of the dead node anymore. Is > the position also stored in RocksDB too somehow? > > thanks in advance, > Gareth > > > > > On Mon, Apr 5, 2021 at 6:34 PM Guozhang Wang <wangg...@gmail.com> wrote: > > > Hello Gareth, > > > > 1) For this scenario, its state should be reusable and we do not need to > > read from scratch from Kafka to rebuild. > > > > 2) "Warmup replicas" is just a special standby replica that is temporary, > > note that if there's no partition migration needed at the moment, the > > num.warmup.replicas is actually zero; the difference of > > `max.warmup.replicas` config and the `num.standby.replicas` config is > that, > > the former is a global limit number, while the latter is a per task > number. > > I.e. if you have a total of N tasks, and you have these configs set as P > > and Q, then during normal processing you'll have (Q+1) * N total > replicas, > > while during a rebalance you may have up to (Q+1) * N + P total replicas. > > As you can see now, setting P to a larger than one value means that a > > single rebalance run may be able to warm-up multiple partitions yet to be > > moved with the cost of more space temporarily, while having a smaller > > number means you may need more rounds of rebalances to achieve the end > > rebalance goal. > > > > 3) Yes, if there are standby replicas, then you can still access > > standby's states via IQ. You can read this KIP for more details: > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-535%3A+Allow+state+stores+to+serve+stale+reads+during+rebalance > > > > > > Guozhang > > > > On Sun, Apr 4, 2021 at 12:41 PM Gareth Collins < > gareth.o.coll...@gmail.com > > > > > wrote: > > > > > Hi, > > > > > > Thanks very much for answers to my previous questions here. > > > > > > I had a couple more questions about repartitioning and I just want to > > > confirm my understanding. > > > > > > (1) Given the following scenario: > > > > > > (a) I have a cluster of Kafka stream nodes with partitions assigned to > > > each. > > > > > > (b) One node goes down...and it goes down for long enough that a > > > repartition happens (i.e. a time greater than > > > scheduled.rebalance.max.delay.ms passes by). > > > > > > (c) Then the node finally comes back. If the state is still there can > it > > > still be used (assuming it is assigned the same partitions)...and only > > the > > > delta read from Kafka? Or will it need to read everything again to > > rebuild > > > the state? I assume it has to re-read the state but I want to make > sure. > > > > > > (2) I understand warmup replicas help with minimizing downtime. If I > > > understand correctly, if I have at least one warmup replica configured > > and > > > if the state needed to be rebuilt from scratch in the scenario above, > > > switchover back to the old node will be delayed until the rebuild is > > > complete. Is my understanding correct? If my understanding is correct, > > why > > > would you ever set more than one warmup replica? Or should warmup > > replicas > > > usually be equal to standby replicas + 1 just in case multiple nodes > are > > > stood up simultaneously? > > > > > > (3) If I set the scheduled rebalance delay to be greater than 0 and a > > node > > > goes down, will I be able to access the state data from other replicas > > > while I am waiting for the rebalance? > > > > > > Any answers would be greatly appreciated. > > > > > > thanks, > > > Gareth > > > > > > > > > -- > > -- Guozhang > > > -- -- Guozhang