Hi Helix community, Few questions regarding error handling at the partition level and rebalancing. I am using automatic rebalance mode with Leader/Standby transition.
1. Error can occur during state transition from STANDBY to LEADER. If an exception is thrown, the state changes to ERROR. However, the partition is not reassigned to another node immediately. The partition stays at ERROR state until a new node comes up. I wonder if there is a way to achieve the reassignment earlier and automatically (or periodic retry on same node). Is there a way to automatically transition from ERROR to DROPPED state? 2. During regular service of a partition, how can an instance signal an error only for one partition it is serving ? I would like for that single partition to be reassigned to another instance (or periodically retried on same instance if others do not have room). 3. It would be ideal if there was a setting for minimum partitions per node to prevent shuffle of partitions among instances when new nodes arrive into the cluster. Is such a rebalancing (or workaround) already present? I would rather have a few instances sit around idly as a spare instance ready for failover instead of having partitions shuffle around given that it takes some time to warm up a partition. Thanks, Vish
