lei-xia commented on a change in pull request #1413:
URL: https://github.com/apache/helix/pull/1413#discussion_r500445155
##########
File path:
helix-core/src/main/java/org/apache/helix/controller/dataproviders/BaseControllerDataProvider.java
##########
@@ -299,6 +306,33 @@ private void updateMaintenanceInfo(final HelixDataAccessor
accessor) {
// The following flag is to guarantee that there's only one update per
pineline run because we
// check for whether maintenance recovery could happen twice every pipeline
_hasMaintenanceSignalChanged = false;
+
+ // If maintenance mode has exited, clear cached timed-out nodes
+ if (!_isMaintenanceModeEnabled) {
+ _timedOutInstanceDuringMaintenance.clear();
+ }
+ }
+
+ private void timeoutNodesDuringMaintenance(final HelixDataAccessor accessor)
{
+ // If maintenance mode is enabled and timeout window is specified, filter
'new' live nodes
+ // for timed-out nodes
+ long timeOutWindow = -1;
+ if (_clusterConfig != null) {
+ timeOutWindow = _clusterConfig.getOfflineNodeTimeOutForMaintenanceMode();
+ }
+ if (timeOutWindow >= 0 && isMaintenanceModeEnabled()) {
+ for (String instance : _liveInstanceCache.getPropertyMap().keySet()) {
+ // 1. Check timed-out cache and don't do repeated work;
+ // 2. Check for nodes that didn't exist in the last iteration, because
it has been checked;
+ // 3. For all other nodes, check if it's timed-out.
+ // When maintenance mode is first entered, all nodes will be checked
as a result.
+ if (!_timedOutInstanceDuringMaintenance.contains(instance)
+ && !_liveInstanceSnapshotForMaintenance.containsKey(instance)
+ && isInstanceTimedOutDuringMaintenance(accessor, instance,
timeOutWindow)) {
+ _timedOutInstanceDuringMaintenance.add(instance);
Review comment:
_liveInstanceSnapshotForMaintenance will be refreshed in the beginning
of every pipeline, it contains all liveInstances (including these that should
be timeout-ed), right? Say, if an (long-offline) instance comes back after the
first pipeline before the next pipeline, that instance will be included in the
_liveInstanceSnapshotForMaintenance and won't be checked here? I.e, is
_liveInstanceSnapshotForMaintenance
always equal to getLiveInstances()? If it is, what is point of keeping a
separate cache?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]