desaikomal opened a new issue, #2521: URL: https://github.com/apache/helix/issues/2521
### Describe the bug Helix maintains the metrics, MaxSinglePartitionTopStateHandoffDuration. This metric is used to see what is the longest time for any partition to be without TopState. We handle this very well for any resource/partition with multiple replicas but we don't seem to handle if the partition has only 1 replica and minActiveReplica set to 0. In this particular case, there is only one replica and if the node on which this partition/replica was residing if that reboots, the metric will start reporting it is missing top state. Even after the node is back online, we continue to report that is missing top state. The reason is that we have special case where we check if the current node with top-state is different from previous node. This will not be true in the above mentioned scenario. ### To Reproduce Create a resource / partition with just 1 replca and min-active-replica count to 0. Once resource is LEADER, reboot the node on which the resource/part was online. You will see that the metrics gets triggered and never resets even after node is back. ### Expected behavior A clear and concise description of what you expected to happen. It should clear when the node i back. ### Additional context Add any other context about the problem here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
