rahulrane50 commented on code in PR #2344:
URL: https://github.com/apache/helix/pull/2344#discussion_r1084406786


##########
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java:
##########
@@ -55,6 +55,54 @@
 import org.slf4j.LoggerFactory;
 
 public class ClusterStatusMonitor implements ClusterStatusMonitorMBean {
+  private class AsyncMissingTopStateMonitor extends Thread {
+    private final Map<String, Map<String, Long>> _missingTopStateResourceMap;
+    private long _missingTopStateDurationThreshold = Long.MAX_VALUE;;
+
+    public AsyncMissingTopStateMonitor(Map<String, Map<String, Long>> 
missingTopStateResourceMap) {
+      _missingTopStateResourceMap = missingTopStateResourceMap;
+    }
+
+    public void setMissingTopStateDurationThreshold(long 
missingTopStateDurationThreshold) {
+      _missingTopStateDurationThreshold = missingTopStateDurationThreshold;
+    }
+
+    @Override
+    public void run() {
+      try {
+        synchronized (this) {
+          while (true) {
+            while (_missingTopStateResourceMap.size() == 0) {
+              this.wait();
+            }
+            for (Iterator<Map.Entry<String, Map<String, Long>>> 
resourcePartitionIt =
+                _missingTopStateResourceMap.entrySet().iterator(); 
resourcePartitionIt.hasNext(); ) {
+              Map.Entry<String, Map<String, Long>> resourcePartitionEntry = 
resourcePartitionIt.next();
+              // Iterate over all partitions and if any partition has missing 
top state greater than threshold then report
+              // it.
+              ResourceMonitor resourceMonitor = 
getOrCreateResourceMonitor(resourcePartitionEntry.getKey());
+              // If all partitions of resource has top state recovered then 
reset the counter
+              if (resourcePartitionEntry.getValue().isEmpty()) {
+                
resourceMonitor.resetOneOrManyPartitionsMissingTopStateRealTimeGuage();
+                resourcePartitionIt.remove();
+              } else {
+                for (Long missingTopStateStartTime : 
resourcePartitionEntry.getValue().values()) {
+                  if (_missingTopStateDurationThreshold < Long.MAX_VALUE && 
System.currentTimeMillis() - missingTopStateStartTime > 
_missingTopStateDurationThreshold) {
+                    
resourceMonitor.updateOneOrManyPartitionsMissingTopStateRealTimeGuage();
+                  }
+                }
+              }
+            }
+            // TODO: Check if this SLEEP_TIME is correct? Thread should keep 
on increasing the counter continuously until top
+            //  state is recovered but it can sleep for reasonable amount of 
time in between.
+            sleep(100);

Review Comment:
   Just replied on last comment. So this sleep() is not related to main 
conditional sleeping. This sleep is added because this for loop will 
continuously increment counters on every iteration which can overflow counter 
if topstate missing situation continues for long time and also it's not really 
needed to increment counter for every single ms/ns. As the TODO says async 
thread should continuously increment counter but with some reasonable(may be 
avg topstate missing duration/4 ms or something like this) sleep in between.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to