[GitHub] [helix] narendly commented on a change in pull request #472: Change the way Helix triggers rebalance
narendly commented on a change in pull request #472: Change the way Helix triggers rebalance URL: https://github.com/apache/helix/pull/472#discussion_r324264090 ## File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java ## @@ -357,6 +357,30 @@ public void scheduleRebalance(long rebalanceTime) { } } + /** + * schedule an instant rebalance pipeline. + */ + public void scheduleInstantRebalance() { +if (_helixManager == null) { + logger.warn("Failed to schedule a future pipeline run for cluster " + _clusterName + + " helix manager is null!"); + return; +} +if (_onDemandRebalanceTimer == null) { Review comment: Can we initialize this in the constructor? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] narendly commented on a change in pull request #472: Change the way Helix triggers rebalance
narendly commented on a change in pull request #472: Change the way Helix triggers rebalance URL: https://github.com/apache/helix/pull/472#discussion_r324266618 ## File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/RebalanceScheduler.java ## @@ -150,9 +153,11 @@ public static void invokeRebalance(HelixDataAccessor accessor, String resource) /** * Trigger the controller to perform rebalance for a given resource. + * This function is deprecated. Please use RebalanceUtil.scheduleInstantPipeline method instead. Review comment: Put this at the top of the comment? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] narendly commented on a change in pull request #472: Change the way Helix triggers rebalance
narendly commented on a change in pull request #472: Change the way Helix triggers rebalance URL: https://github.com/apache/helix/pull/472#discussion_r324266265 ## File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java ## @@ -1197,4 +1223,4 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache) eventThread.setDaemon(true); eventThread.start(); } -} \ No newline at end of file +} Review comment: Remove this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] narendly commented on a change in pull request #472: Change the way Helix triggers rebalance
narendly commented on a change in pull request #472: Change the way Helix triggers rebalance URL: https://github.com/apache/helix/pull/472#discussion_r324402481 ## File path: helix-core/src/main/java/org/apache/helix/util/RebalanceUtil.java ## @@ -136,4 +141,13 @@ result[1] = slaveStateValue; return result; } + + public static void scheduleInstantPipeline(String clusterName) { +GenericHelixController controller = GenericHelixController.getController(clusterName); +if (controller != null) { + controller.scheduleInstantRebalance(); +} else { + LOG.warn("Failed to issue a pipeline run for cluster {}.", clusterName); Review comment: Can you make the log message clear? Under what circumstances would controller be null? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] narendly commented on a change in pull request #472: Change the way Helix triggers rebalance
narendly commented on a change in pull request #472: Change the way Helix triggers rebalance URL: https://github.com/apache/helix/pull/472#discussion_r324266592 ## File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/RebalanceScheduler.java ## @@ -124,15 +125,17 @@ public RebalanceInvoker(HelixManager manager, String resource) { @Override public void run() { - invokeRebalance(_manager.getHelixDataAccessor(), _resource); + RebalanceUtil.scheduleInstantPipeline(_manager.getClusterName()); } } /** * Trigger the controller to perform rebalance for a given resource. + * This function is deprecated. Please use RebalanceUtil.scheduleInstantPipeline method instead. Review comment: Put this at the top of the comment? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] narendly commented on a change in pull request #472: Change the way Helix triggers rebalance
narendly commented on a change in pull request #472: Change the way Helix triggers rebalance URL: https://github.com/apache/helix/pull/472#discussion_r324265733 ## File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java ## @@ -357,6 +357,30 @@ public void scheduleRebalance(long rebalanceTime) { } } + /** + * schedule an instant rebalance pipeline. + */ + public void scheduleInstantRebalance() { +if (_helixManager == null) { + logger.warn("Failed to schedule a future pipeline run for cluster " + _clusterName + + " helix manager is null!"); + return; +} +if (_onDemandRebalanceTimer == null) { + _onDemandRebalanceTimer = new Timer(true); +} + +RebalanceTask newTask = new RebalanceTask(_helixManager, ClusterEventType.OnDemandRebalance); + +_onDemandRebalanceTimer.schedule(newTask, 0); +logger.info("Scheduled instant pipeline run for cluster " + _helixManager.getClusterName()); Review comment: Nit: use {} This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] jiajunwang merged pull request #458: Selectively update cache and notify for every change in CurrentStates based RoutingTableProvider.
jiajunwang merged pull request #458: Selectively update cache and notify for every change in CurrentStates based RoutingTableProvider. URL: https://github.com/apache/helix/pull/458 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] pkuwm commented on issue #458: Selectively update cache and notify for every change in CurrentStates based RoutingTableProvider.
pkuwm commented on issue #458: Selectively update cache and notify for every change in CurrentStates based RoutingTableProvider. URL: https://github.com/apache/helix/pull/458#issuecomment-531426451 Final comment for the commit: title: Fix missed callbacks in CurrentStates based RoutingTableProvider. description: 1. Update BasicClusterDataCache to do refresh with selective update. Only when a change happens, we do the cache refresh only for that change type. 2. Improve RoutingTableProvider.queueEvent() and RoutingTableProvider.handleEvent(). Return instanceConfigs snapshot to callback immediately, instead of waiting for currentStates completion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] jiajunwang commented on a change in pull request #474: PartitionMovementSoftConstraint Implementation
jiajunwang commented on a change in pull request #474: PartitionMovementSoftConstraint Implementation URL: https://github.com/apache/helix/pull/474#discussion_r324396502 ## File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/PartitionMovementConstraint.java ## @@ -0,0 +1,95 @@ +package org.apache.helix.controller.rebalancer.waged.constraints; + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +import java.util.Collections; +import java.util.Map; + +import org.apache.helix.controller.rebalancer.waged.model.AssignableNode; +import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica; +import org.apache.helix.controller.rebalancer.waged.model.ClusterContext; +import org.apache.helix.model.Partition; +import org.apache.helix.model.ResourceAssignment; + +/** + * Evaluate the proposed assignment according to the potential partition movements cost. + * The cost is evaluated based on the difference between the old assignment and the new assignment. + * In detail, we consider the following two previous assignments as the base. + * - Baseline assignment that is calculated regardless of the node state (online/offline). + * - Previous Best Possible assignment. + * Any change to these two assignments will increase the partition movements cost, so that the + * evaluated score will become lower. + */ +class PartitionMovementConstraint extends SoftConstraint { + private static final float MAX_SCORE = 1f; + private static final float MIN_SCORE = 0f; + // This factor indicates the default score that is evaluated if only partition allocation matches Review comment: Please add TODO here, these 2 numbers also need to be tuned. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] jiajunwang commented on a change in pull request #474: PartitionMovementSoftConstraint Implementation
jiajunwang commented on a change in pull request #474: PartitionMovementSoftConstraint Implementation URL: https://github.com/apache/helix/pull/474#discussion_r324396242 ## File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/ClusterContext.java ## @@ -68,6 +74,16 @@ _estimatedMaxPartitionCount = estimateAvgReplicaCount(totalReplicas, instanceCount); _estimatedMaxTopStateCount = estimateAvgReplicaCount(totalTopStateReplicas, instanceCount); +_baselineAssignment = baselineAssignment; +_bestPossibleAssignment = bestPossibleAssignment; + } + + public Map getBaselineAssignment() { +return _baselineAssignment; Review comment: Safer to ensure we always return an empty map if the _baselineAssignment is null. I know it is not possible with the current code. Just to be safe. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] jiajunwang commented on a change in pull request #474: PartitionMovementSoftConstraint Implementation
jiajunwang commented on a change in pull request #474: PartitionMovementSoftConstraint Implementation URL: https://github.com/apache/helix/pull/474#discussion_r324396260 ## File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/ClusterContext.java ## @@ -68,6 +74,16 @@ _estimatedMaxPartitionCount = estimateAvgReplicaCount(totalReplicas, instanceCount); _estimatedMaxTopStateCount = estimateAvgReplicaCount(totalTopStateReplicas, instanceCount); +_baselineAssignment = baselineAssignment; +_bestPossibleAssignment = bestPossibleAssignment; + } + + public Map getBaselineAssignment() { +return _baselineAssignment; + } + + public Map getBestPossibleAssignment() { +return _bestPossibleAssignment; Review comment: Same here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] jiajunwang commented on a change in pull request #474: PartitionMovementSoftConstraint Implementation
jiajunwang commented on a change in pull request #474: PartitionMovementSoftConstraint Implementation URL: https://github.com/apache/helix/pull/474#discussion_r324396615 ## File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/PartitionMovementConstraint.java ## @@ -0,0 +1,95 @@ +package org.apache.helix.controller.rebalancer.waged.constraints; + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +import java.util.Collections; +import java.util.Map; + +import org.apache.helix.controller.rebalancer.waged.model.AssignableNode; +import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica; +import org.apache.helix.controller.rebalancer.waged.model.ClusterContext; +import org.apache.helix.model.Partition; +import org.apache.helix.model.ResourceAssignment; + +/** + * Evaluate the proposed assignment according to the potential partition movements cost. + * The cost is evaluated based on the difference between the old assignment and the new assignment. + * In detail, we consider the following two previous assignments as the base. + * - Baseline assignment that is calculated regardless of the node state (online/offline). + * - Previous Best Possible assignment. + * Any change to these two assignments will increase the partition movements cost, so that the + * evaluated score will become lower. + */ +class PartitionMovementConstraint extends SoftConstraint { + private static final float MAX_SCORE = 1f; + private static final float MIN_SCORE = 0f; + // This factor indicates the default score that is evaluated if only partition allocation matches + // (states are different). + private static final float ALLOCATION_MATCH_FACTOR = 0.5f; + // This factor indicates the contribution of the Baseline assignment matching to the final score. + private static final float BASELINE_MATCH_FACTOR = 0.25f; + + PartitionMovementConstraint() { +super(MAX_SCORE, MIN_SCORE); + } + + @Override + protected float getAssignmentScore(AssignableNode node, AssignableReplica replica, + ClusterContext clusterContext) { +Map bestPossibleStateMap = +getStateMap(replica, clusterContext.getBestPossibleAssignment()); +Map baselineStateMap = +getStateMap(replica, clusterContext.getBaselineAssignment()); + +// Prioritize the matching of the previous Best Possible assignment. +float scale = calculateAssignmentScale(node, replica, bestPossibleStateMap); +// If the baseline is also provided, adjust the final score accordingly. +scale = scale * (1 - BASELINE_MATCH_FACTOR) Review comment: If the baselineStateMap is empty, this calculation is not necessary. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] jiajunwang commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.
jiajunwang commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components. URL: https://github.com/apache/helix/pull/466#discussion_r324373074 ## File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedRebalancer.java ## @@ -349,4 +338,61 @@ private IdealState generateIdealStateWithAssignment(String resourceName, } return preferenceList; } + + private Map getBaseline( + AssignmentMetadataStore assignmentMetadataStore, CurrentStateOutput currentStateOutput, + Set resources) throws HelixRebalanceException { +Map currentBaseline; +try { + currentBaseline = assignmentMetadataStore.getBaseline(); +} catch (HelixException hex) { + LOG.warn("Failed to get the current baseline assignment. Use the current states instead", + hex); + currentBaseline = getCurrentStateAssingment(currentStateOutput, resources); +} catch (Exception ex) { + throw new HelixRebalanceException( + "Failed to get the current baseline assignment because of unexpected error.", + HelixRebalanceException.Type.INVALID_REBALANCER_STATUS, ex); +} +return currentBaseline; + } + + private Map getBestPossibleAssignment( + AssignmentMetadataStore assignmentMetadataStore, CurrentStateOutput currentStateOutput, + Set resources) throws HelixRebalanceException { +Map currentBestAssignment; +try { + // TODO fix the deserialize Review comment: The issus has been fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] jiajunwang commented on issue #466: Integrate the WAGED rebalancer with all the related components.
jiajunwang commented on issue #466: Integrate the WAGED rebalancer with all the related components. URL: https://github.com/apache/helix/pull/466#issuecomment-531395357 > If the constraint factory/wagedRebalancer fails to produce the mapping, should we put the cluster in maintenance mode or treat it as a NOP? I'd like to guide us to think about in what scenarios putting the cluster under maintenance would be appropriate. The current design returns the previous mapping if a calculation fails (this is not done yet). Entering maintenance mode means we need to auto exist as well. This makes maintenance mode management more complicated. So I would prefer keeping it simple for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] i3wangyi opened a new pull request #474: PartitionMovementSoftConstraint Implementation
i3wangyi opened a new pull request #474: PartitionMovementSoftConstraint Implementation URL: https://github.com/apache/helix/pull/474 ### Issues - [ ] My PR addresses the following Helix issues and references them in the PR description: (#200 - Link your issue number here: You can write "Fixes #XXX") ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: (Write a concise description including what, why, how) ### Tests - [ ] The following tests are written for this issue: (List the names of added unit/integration tests) - [ ] The following is the result of the "mvn test" command on the appropriate module: (Copy & paste the result of "mvn test") ### Commits - [ ] My commits all reference appropriate Apache Helix GitHub issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation in the following wiki page: (Link the GitHub wiki you added) ### Code Quality - [ ] My diff has been formatted using helix-style.xml This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] mcvsubbu opened a new issue #473: Helix 0.9.1 causes intermittent failures in pinot integration tests
mcvsubbu opened a new issue #473: Helix 0.9.1 causes intermittent failures in pinot integration tests URL: https://github.com/apache/helix/issues/473 helix-0.9.1 causes intermittent failures in pinot integration tests. The failures are mostly timeouts because most pinot integration tests end up starting up a pinot cluster (controller, server, broker), creating some tables (helix resources), uploading segments (helix partitions on the tables created), and awaiting for all those partitions to be ONLINE. Intermittently, tests time-out waiting for this setup step (before running the actual tests). Here is one way to reproduce one of the failures fairly quickly (at least, on my Linux desktop): Download the patch file attached as /tmp/add-logs.patch, and execute the following commands: ``` PATCH_FILE="/tmp/add-logs.patch" git clone https://github.com/apache/incubator-pinot.git git checkout try_hotfix_helix mvn clean install -Pbuild-shaded-jar -DskipTests cd pinot-controller patch -p2 < ${PATCH_FILE} while [ true ]; do echo Starting at `date`; mvn test -Dtest=ControllerInstanceToggleTest 1> ~/fail.local.logs 2>&1; if [ $? -ne 0 ]; then break; fi; done; echo "FAILED. See ~/fail.local.logs" ``` The patch is as follows: ``` diff --git a/pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotInstanceRestletResource.java b/pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotInstanceRestletResource.java index 79788c1f7..b52a0a6a5 100644 --- a/pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotInstanceRestletResource.java +++ b/pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotInstanceRestletResource.java @@ -129,14 +129,17 @@ public class PinotInstanceRestletResource { Response.Status.NOT_FOUND); } +PinotResourceManagerResponse resp; if (StateType.ENABLE.name().equalsIgnoreCase(state)) { - if (!pinotHelixResourceManager.enableInstance(instanceName).isSuccessful()) { -throw new ControllerApplicationException(LOGGER, "Failed to enable instance " + instanceName, + resp = pinotHelixResourceManager.enableInstance(instanceName); + if (!resp.isSuccessful()) { +throw new ControllerApplicationException(LOGGER, "Failed to enable instance " + instanceName + ":" + resp.getMessage(), Response.Status.INTERNAL_SERVER_ERROR); } } else if (StateType.DISABLE.name().equalsIgnoreCase(state)) { - if (!pinotHelixResourceManager.disableInstance(instanceName).isSuccessful()) { -throw new ControllerApplicationException(LOGGER, "Failed to disable instance " + instanceName, + resp = pinotHelixResourceManager.disableInstance(instanceName); + if (!resp.isSuccessful()) { +throw new ControllerApplicationException(LOGGER, "Failed to disable instance " + instanceName + ":" + resp.getMessage(), Response.Status.INTERNAL_SERVER_ERROR); } } else if (StateType.DROP.name().equalsIgnoreCase(state)) { diff --git a/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java b/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java index 0ceb65f47..4bc7fcfad 100644 --- a/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java +++ b/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java @@ -2170,7 +2170,7 @@ public class PinotHelixResourceManager { PropertyKey liveInstanceKey = _keyBuilder.liveInstance(instanceName); LiveInstance liveInstance = _helixDataAccessor.getProperty(liveInstanceKey); if (liveInstance == null) { -return toggle ? PinotResourceManagerResponse.FAILURE : PinotResourceManagerResponse.SUCCESS; +return toggle ? PinotResourceManagerResponse.failure("liveInstance null for " + instanceName) : PinotResourceManagerResponse.SUCCESS; } PropertyKey instanceCurrentStatesKey = _keyBuilder.currentStates(instanceName, liveInstance.getSessionId()); List instanceCurrentStates = _helixDataAccessor.getChildValues(instanceCurrentStatesKey); @@ -2195,7 +2195,7 @@ public class PinotHelixResourceManager { } } } -return PinotResourceManagerResponse.failure("Instance enable/disable failed, timeout"); +return PinotResourceManagerResponse.failure("Instance enable/disable failed, timeout for instance " + instanceName); } @Nonnull diff --git a/pinot-controller/src/test/java/org/apache/pinot/controller/helix/ControllerInstanceToggleTest.java
[GitHub] [helix] i3wangyi commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.
i3wangyi commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components. URL: https://github.com/apache/helix/pull/466#discussion_r324314076 ## File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedRebalancer.java ## @@ -241,21 +232,18 @@ private void refreshBaseline(ResourceControllerDataProvider clusterData, private Map partialRebalance( ResourceControllerDataProvider clusterData, - Map> clusterChanges, Map resourceMap) - throws HelixRebalanceException { + Map> clusterChanges, Map resourceMap, + final CurrentStateOutput currentStateOutput) throws HelixRebalanceException { Review comment: I see the TODO, however, the parameter is passed along from method A -> B -> C -> D, it's actually quite difficult to follow This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] i3wangyi commented on a change in pull request #464: Add soft constraint: ResourcePartitionAntiAffinityConstraint
i3wangyi commented on a change in pull request #464: Add soft constraint: ResourcePartitionAntiAffinityConstraint URL: https://github.com/apache/helix/pull/464#discussion_r324312068 ## File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ResourcePartitionAntiAffinityConstraint.java ## @@ -0,0 +1,44 @@ +package org.apache.helix.controller.rebalancer.waged.constraints; + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +import org.apache.helix.controller.rebalancer.waged.model.AssignableNode; +import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica; +import org.apache.helix.controller.rebalancer.waged.model.ClusterContext; + +/** + * Evaluate the proposed assignment according to the other partitions assignment within the scope of + * the same resource. + * If the resource of the partition overall has a light load on the instance, the score is higher compared to + * case when the resource is heavily loaded on the instance + */ +public class ResourcePartitionAntiAffinityConstraint extends SoftConstraint { + + @Override + protected float getAssignmentScore(AssignableNode node, AssignableReplica replica, + ClusterContext clusterContext) { +String resource = replica.getResourceName(); +int curPartitionCountForResource = node.getAssignedPartitionsByResource(resource).size(); +int doubleMaxPartitionCountForResource = +2 * clusterContext.getEstimatedMaxPartitionByResource(resource); +return Math.max(((float) doubleMaxPartitionCountForResource - curPartitionCountForResource) Review comment: Updated the min/max values. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] i3wangyi commented on a change in pull request #464: Add soft constraint: ResourcePartitionAntiAffinityConstraint
i3wangyi commented on a change in pull request #464: Add soft constraint: ResourcePartitionAntiAffinityConstraint URL: https://github.com/apache/helix/pull/464#discussion_r324311995 ## File path: helix-core/src/test/java/org/apache/helix/controller/rebalancer/waged/constraints/TestResourcePartitionAntiAffinityConstraint.java ## @@ -0,0 +1,63 @@ +package org.apache.helix.controller.rebalancer.waged.constraints; + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +import static org.mockito.Mockito.when; + +import org.apache.helix.controller.rebalancer.waged.model.AssignableNode; +import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica; +import org.apache.helix.controller.rebalancer.waged.model.ClusterContext; +import org.mockito.Mockito; +import org.testng.Assert; +import org.testng.annotations.Test; + +import com.google.common.collect.ImmutableSet; + +import java.util.Collections; + +public class TestResourcePartitionAntiAffinityConstraint { Review comment: Sure, sounds good! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] i3wangyi commented on a change in pull request #464: Add soft constraint: ResourcePartitionAntiAffinityConstraint
i3wangyi commented on a change in pull request #464: Add soft constraint: ResourcePartitionAntiAffinityConstraint URL: https://github.com/apache/helix/pull/464#discussion_r324310252 ## File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ResourcePartitionAntiAffinityConstraint.java ## @@ -0,0 +1,44 @@ +package org.apache.helix.controller.rebalancer.waged.constraints; + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +import org.apache.helix.controller.rebalancer.waged.model.AssignableNode; +import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica; +import org.apache.helix.controller.rebalancer.waged.model.ClusterContext; + +/** + * Evaluate the proposed assignment according to the other partitions assignment within the scope of Review comment: Done. Much clearer! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] jiajunwang commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.
jiajunwang commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components. URL: https://github.com/apache/helix/pull/466#discussion_r324309733 ## File path: helix-core/src/test/java/org/apache/helix/controller/rebalancer/waged/model/AbstractTestClusterModel.java ## @@ -105,6 +105,7 @@ protected ResourceControllerDataProvider setupClusterDataCache() throws IOExcept testClusterConfig.setDisabledInstances(Collections.emptyMap()); testClusterConfig.setTopologyAwareEnabled(false); testClusterConfig.setInstanceCapacityKeys(new ArrayList<>(_capacityDataMap.keySet())); +testClusterConfig.setTopologyAwareEnabled(true); Review comment: Good catch. The rebalance used to not support it. Now we should test with more complicated logic. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] jiajunwang commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.
jiajunwang commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components. URL: https://github.com/apache/helix/pull/466#discussion_r324308536 ## File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedRebalancer.java ## @@ -349,4 +333,82 @@ private IdealState generateIdealStateWithAssignment(String resourceName, } return preferenceList; } + + /** + * @param assignmentMetadataStore + * @param currentStateOutput + * @param resources + * @return The current baseline assignment. If record does not exist in the + * assignmentMetadataStore, return the current state assignment. + * @throws HelixRebalanceException + */ + private Map getBaselineAssignment( + AssignmentMetadataStore assignmentMetadataStore, CurrentStateOutput currentStateOutput, + Set resources) throws HelixRebalanceException { +Map currentBaseline; +try { + currentBaseline = assignmentMetadataStore.getBaseline(); + if (currentBaseline.isEmpty()) { +LOG.warn( +"The current baseline assignment record is empty. Use the current states instead."); Review comment: Make sense, but the event id is in the stage class only. Event object is not passed to the rebalancer. Let's revisit this after the major logic is done. This point applies to all the other rebalancers as well. I think. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] jiajunwang commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.
jiajunwang commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components. URL: https://github.com/apache/helix/pull/466#discussion_r324303760 ## File path: helix-core/src/main/java/org/apache/helix/HelixRebalanceException.java ## @@ -27,6 +27,7 @@ INVALID_CLUSTER_STATUS, INVALID_REBALANCER_STATUS, FAILED_TO_CALCULATE, +INVALID_INPUT, Review comment: I agree. This shall be done when we finish the error reporting feature. The description will be fully used when it is reported in either log or some alerts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] narendly commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.
narendly commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components. URL: https://github.com/apache/helix/pull/466#discussion_r324049647 ## File path: helix-core/src/main/java/org/apache/helix/HelixRebalanceException.java ## @@ -27,6 +27,7 @@ INVALID_CLUSTER_STATUS, INVALID_REBALANCER_STATUS, FAILED_TO_CALCULATE, +INVALID_INPUT, Review comment: I think it would be a good idea to actually add some description about each of these failure modes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] narendly commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.
narendly commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components. URL: https://github.com/apache/helix/pull/466#discussion_r324051056 ## File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedRebalancer.java ## @@ -118,27 +110,26 @@ protected WagedRebalancer(AssignmentMetadataStore assignmentMetadataStore, public Map computeNewIdealStates(ResourceControllerDataProvider clusterData, Map resourceMap, final CurrentStateOutput currentStateOutput) throws HelixRebalanceException { -LOG.info("Start computing new ideal states for resources: {}", resourceMap.keySet().toString()); +if (resourceMap.isEmpty()) { + LOG.warn("There is no resource to be rebalanced by {}", + this.getClass().getSimpleName()); + return Collections.emptyMap(); +} -// Find the compatible resources: 1. FULL_AUTO 2. Configured to use the WAGED rebalancer -resourceMap = resourceMap.entrySet().stream().filter(resourceEntry -> { +LOG.info("Start computing new ideal states for resources: {}", resourceMap.keySet().toString()); +if (!resourceMap.entrySet().stream().allMatch(resourceEntry -> { IdealState is = clusterData.getIdealState(resourceEntry.getKey()); return is != null && is.getRebalanceMode().equals(IdealState.RebalanceMode.FULL_AUTO) && getClass().getName().equals(is.getRebalancerClassName()); -}).collect(Collectors.toMap(resourceEntry -> resourceEntry.getKey(), -resourceEntry -> resourceEntry.getValue())); - -if (resourceMap.isEmpty()) { - LOG.warn("There is no valid resource to be rebalanced by {}", - this.getClass().getSimpleName()); - return Collections.emptyMap(); -} else { - LOG.info("Valid resources that will be rebalanced by {}: {}", this.getClass().getSimpleName(), - resourceMap.keySet().toString()); +})) { + throw new HelixRebalanceException( + "Input contains invalid resource(s) that cannot be rebalanced by the WAGED rebalancer.", Review comment: I think that the occurrence for this Exception would be low, so would it be a better idea to actually log the names of resources that cannot be rebalanced? From the operator's perspective, it would be very, very helpful. If I just saw this exception without the actual names of the resources, this error message is almost useless. Say you have 100 resources, and the rebalance failed due to 1 of them - then you would have to loop through all 100 of them just to find the problem child! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org
[GitHub] [helix] narendly commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.
narendly commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components. URL: https://github.com/apache/helix/pull/466#discussion_r324050208 ## File path: helix-core/src/test/java/org/apache/helix/controller/rebalancer/waged/model/AbstractTestClusterModel.java ## @@ -105,6 +105,7 @@ protected ResourceControllerDataProvider setupClusterDataCache() throws IOExcept testClusterConfig.setDisabledInstances(Collections.emptyMap()); testClusterConfig.setTopologyAwareEnabled(false); testClusterConfig.setInstanceCapacityKeys(new ArrayList<>(_capacityDataMap.keySet())); +testClusterConfig.setTopologyAwareEnabled(true); Review comment: Why do you set this field to false and back to true? Remove line 106? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org