desaikomal opened a new issue, #2521:
URL: https://github.com/apache/helix/issues/2521

   ### Describe the bug
   Helix maintains the metrics, MaxSinglePartitionTopStateHandoffDuration. This 
metric is used to see what is the longest time for any partition to be without 
TopState. 
   We handle this very well for any resource/partition with multiple replicas 
but we don't seem to handle if the partition has only 1 replica and 
minActiveReplica set to 0. 
   
   In this particular case, there is only one replica and if the node on which 
this partition/replica was residing if that reboots, the metric will start 
reporting it is missing top state. Even after the node is back online, we 
continue to report that is missing top state. 
   
   The reason is that we have special case where we check if the current node 
with top-state is different from previous node. This will not be true in the 
above mentioned scenario.
   
   
   ### To Reproduce
   Create a resource / partition with just 1 replca and min-active-replica 
count to 0.  Once resource is LEADER, reboot the node on which the 
resource/part was online. You will see that the metrics gets triggered and 
never resets even after node is back.
   
   ### Expected behavior
   A clear and concise description of what you expected to happen.
   It should clear when the node i back.
   
   ### Additional context
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to