pkuwm commented on a change in pull request #1037: URL: https://github.com/apache/helix/pull/1037#discussion_r435040792
########## File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/constraint/ExcessiveTopStateResolver.java ########## @@ -0,0 +1,120 @@ +package org.apache.helix.controller.rebalancer.constraint; + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.helix.api.rebalancer.constraint.AbnormalStateResolver; +import org.apache.helix.controller.stages.CurrentStateOutput; +import org.apache.helix.model.Partition; +import org.apache.helix.model.StateModelDefinition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * The abnormal state resolver that gracefully fixes the abnormality of excessive top states for + * single-topstate state model. For example, two replcias of a MasterSlave partition are assigned + * with the Master state at the same time. This could be caused by a network partitioning or the + * other unexpected issues. + * + * The resolver checks for the abnormality and computes recovery assignment which triggers the + * rebalancer to eventually reset all the top state replias for once. After the resets, only one + * replica will be assigned the top state. + * + * Note that without using this resolver, the regular Helix rebalance pipeline also removes the + * excessive top state replicas. However, the default logic does not force resetting ALL the top + * state replicas. Since the multiple top states situation may break application data, the default + * resolution won't be enough to fix the potential problem. + */ +public class ExcessiveTopStateResolver implements AbnormalStateResolver { + private static final Logger LOG = LoggerFactory.getLogger(ExcessiveTopStateResolver.class); + + /** + * The current states are not valid if there are more than 2 top state replicas for a single top Review comment: More than **1** top state? "a single top state model"? ########## File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/constraint/ExcessiveTopStateResolver.java ########## @@ -0,0 +1,112 @@ +package org.apache.helix.controller.rebalancer.constraint; + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.helix.api.rebalancer.constraint.AbnormalStateResolver; +import org.apache.helix.controller.stages.CurrentStateOutput; +import org.apache.helix.model.Partition; +import org.apache.helix.model.StateModelDefinition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * The abnormal state resolver that graceful fixes double-topstates issue for the single topstate + * state model. + * Note the regular Helix rebalance pipeline will also remove the excessive top state replica. + * However, the default rebalancer logic cannot guarantee a clean resolution. For example, if the + * double-topstates situation has already impact the data of the top state replicas, then the + * controller should reset both of them, then bring back one top state replica on the right + * allocation. For the application which has such a requirement, they should use this resolver or + * a more advanced resolver which check the application data to ensure the resolution is complete. + */ +public class ExcessiveTopStateResolver implements AbnormalStateResolver { + private static final Logger LOG = LoggerFactory.getLogger(ExcessiveTopStateResolver.class); + + /** + * The current states are not valid if there are more than 2 top state replicas for a single top + * state state model. + */ + @Override + public boolean isCurrentStatesValid(final CurrentStateOutput currentStateOutput, + final String resourceName, final Partition partition, StateModelDefinition stateModelDef) { + if (!stateModelDef.isSingleTopStateModel()) { + return true; + } + if (currentStateOutput.getCurrentStateMap(resourceName, partition).values().stream() + .filter(state -> state.equals(stateModelDef.getTopState())).count() > 1) { + return false; Review comment: @jiajunwang In CurrentStateOutput, could we add a top state counter map so we could cache the top state counter, like below? Then we could avoid that stream filter computation? Tradeoff is we need a bit more memory for the cache. But most of them are just references. ``` public void setCurrentState(String resourceName, Partition partition, String instanceName, String state) { (...... current code ......) // Counter number of top state replicas for a single top state model. if (state.equals(stateModelDef.getTopState())) { Map<String, Integer> counterMap = _topStateCounter.computeIfAbsent(resourceName, k -> new HashMap<>()) .computeIfAbsent(partition, k -> new HashMap<>()); counterMap.put(state, counterMap.getOrDefault(state, 0)); } } ``` Not sure if we need to optimize this. Maybe you could test it. It seems for this part, the time complexity is down from O(n) to O(1), but I am not sure what the actual time saving is, considering the whole pipeline. If the whole pipeline complexity is O(N^2), with this optimization, it is O(N), that may help. If the whole pipeline is O(2 * N), with this optimization, still O(N). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
