Questions around Error Handling and Shuffling

vish . ramachandran Mon, 08 Oct 2018 17:12:01 -0700

Hi Helix community,

Few questions regarding error handling at the partition level and
rebalancing. I am using automatic rebalance mode with Leader/Standby
transition.


1. Error can occur during state transition from STANDBY to LEADER. If an
exception is thrown, the state changes to ERROR. However, the partition is
not reassigned to another node immediately. The partition stays at ERROR
state until a new node comes up. I wonder if there is a way to achieve the
reassignment earlier and automatically (or periodic retry on same node). Is
there a way to automatically transition from ERROR to DROPPED state?

2. During regular service of a partition, how can an instance signal an
error only for one partition it is serving ? I would like for that single
partition to be reassigned to another instance (or periodically retried on
same instance if others do not have room).

3. It would be ideal if there was a setting for minimum partitions per node
to prevent shuffle of partitions among instances when new nodes arrive into
the cluster. Is such a rebalancing (or workaround) already present? I would
rather have a few instances sit around idly as a spare instance ready for
failover instead of having partitions shuffle around given that it takes
some time to warm up a partition.

Thanks,
Vish

Questions around Error Handling and Shuffling

Reply via email to