junkaixue commented on code in PR #2344:
URL: https://github.com/apache/helix/pull/2344#discussion_r1090061039
##########
helix-core/src/main/java/org/apache/helix/controller/stages/TopStateHandoffReportStage.java:
##########
@@ -319,12 +322,16 @@ private void
reportTopStateHandoffFailIfNecessary(ResourceControllerDataProvider
String partitionName = partition.getPartitionName();
MissingTopStateRecord record =
missingTopStateMap.get(resourceName).get(partitionName);
long startTime = record.getStartTimeStamp();
- if (startTime > 0 && System.currentTimeMillis() - startTime >
durationThreshold && !record
- .isFailed()) {
- record.setFailed();
- missingTopStateMap.get(resourceName).put(partitionName, record);
+ if (startTime > 0 && !record.isFailed()) {
if (clusterStatusMonitor != null) {
- clusterStatusMonitor.updateMissingTopStateDurationStats(resourceName,
0L, 0L, false, false);
+ clusterStatusMonitor.updateMissingTopStateResourceMap(resourceName,
partitionName, true, startTime);
+ }
+ if (System.currentTimeMillis() - startTime > durationThreshold) {
+ record.setFailed();
+ missingTopStateMap.get(resourceName).put(partitionName, record);
+ if (clusterStatusMonitor != null) {
+
clusterStatusMonitor.updateMissingTopStateDurationStats(resourceName, 0L, 0L,
false, false);
Review Comment:
Maybe we can deprecate this piece of logic in the future. Because anyway we
already report the top state missing there is a one.
I am not sure whether would be useful to keep the duration one for
end-to-end.
##########
helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java:
##########
@@ -55,6 +55,61 @@
import org.slf4j.LoggerFactory;
public class ClusterStatusMonitor implements ClusterStatusMonitorMBean {
+ private class AsyncMissingTopStateMonitor extends Thread {
+ private final ConcurrentHashMap<String, Map<String, Long>>
_missingTopStateResourceMap;
+ private long _missingTopStateDurationThreshold = Long.MAX_VALUE;;
+
+ public AsyncMissingTopStateMonitor(ConcurrentHashMap<String, Map<String,
Long>> missingTopStateResourceMap) {
+ _missingTopStateResourceMap = missingTopStateResourceMap;
+ }
+
+ public void setMissingTopStateDurationThreshold(long
missingTopStateDurationThreshold) {
+ _missingTopStateDurationThreshold = missingTopStateDurationThreshold;
+ }
+
+ @Override
+ public void run() {
+ try {
+ synchronized (this) {
+ while (true) {
+ while (_missingTopStateResourceMap.size() == 0) {
+ this.wait();
+ }
+ for (Iterator<Map.Entry<String, Map<String, Long>>>
resourcePartitionIt =
+ _missingTopStateResourceMap.entrySet().iterator();
resourcePartitionIt.hasNext(); ) {
+ Map.Entry<String, Map<String, Long>> resourcePartitionEntry =
resourcePartitionIt.next();
+ // Iterate over all partitions and if any partition has missing
top state greater than threshold then report
+ // it.
+ ResourceMonitor resourceMonitor =
getOrCreateResourceMonitor(resourcePartitionEntry.getKey());
+ // If all partitions of resource has top state recovered then
reset the counter
+ if (resourcePartitionEntry.getValue().isEmpty()) {
+ resourceMonitor.resetMissingTopStateDurationGuage();
+ resourcePartitionIt.remove();
+ } else {
+ for (Long missingTopStateStartTime :
resourcePartitionEntry.getValue().values()) {
+ if (_missingTopStateDurationThreshold < Long.MAX_VALUE &&
System.currentTimeMillis() - missingTopStateStartTime >
_missingTopStateDurationThreshold) {
+
resourceMonitor.updateMissingTopStateDurationGuage(System.currentTimeMillis() -
missingTopStateStartTime);
+ }
+ }
+
+ }
+ }
+ sleep(50); // Instead of providing stream of durtion values to
histogram thread can sleep in between to save some CPU cycles.
Review Comment:
I am still concerning this number as hard coding here... It would not be a
good thing. Could you explain why adding this number helps?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]