[GitHub] [helix] narendly commented on a change in pull request #472: Change the way Helix triggers rebalance

2019-09-13 Thread GitBox
narendly commented on a change in pull request #472: Change the way Helix 
triggers rebalance
URL: https://github.com/apache/helix/pull/472#discussion_r324264090
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##
 @@ -357,6 +357,30 @@ public void scheduleRebalance(long rebalanceTime) {
 }
   }
 
+  /**
+   * schedule an instant rebalance pipeline.
+   */
+  public void scheduleInstantRebalance() {
+if (_helixManager == null) {
+  logger.warn("Failed to schedule a future pipeline run for cluster " + 
_clusterName
+  + " helix manager is null!");
+  return;
+}
+if (_onDemandRebalanceTimer == null) {
 
 Review comment:
   Can we initialize this in the constructor?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] narendly commented on a change in pull request #472: Change the way Helix triggers rebalance

2019-09-13 Thread GitBox
narendly commented on a change in pull request #472: Change the way Helix 
triggers rebalance
URL: https://github.com/apache/helix/pull/472#discussion_r324266618
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/RebalanceScheduler.java
 ##
 @@ -150,9 +153,11 @@ public static void invokeRebalance(HelixDataAccessor 
accessor, String resource)
 
   /**
* Trigger the controller to perform rebalance for a given resource.
+   * This function is deprecated. Please use 
RebalanceUtil.scheduleInstantPipeline method instead.
 
 Review comment:
   Put this at the top of the comment?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] narendly commented on a change in pull request #472: Change the way Helix triggers rebalance

2019-09-13 Thread GitBox
narendly commented on a change in pull request #472: Change the way Helix 
triggers rebalance
URL: https://github.com/apache/helix/pull/472#discussion_r324266265
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##
 @@ -1197,4 +1223,4 @@ private void initPipeline(Thread eventThread, 
BaseControllerDataProvider cache)
 eventThread.setDaemon(true);
 eventThread.start();
   }
-}
\ No newline at end of file
+}
 
 Review comment:
   Remove this


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] narendly commented on a change in pull request #472: Change the way Helix triggers rebalance

2019-09-13 Thread GitBox
narendly commented on a change in pull request #472: Change the way Helix 
triggers rebalance
URL: https://github.com/apache/helix/pull/472#discussion_r324402481
 
 

 ##
 File path: helix-core/src/main/java/org/apache/helix/util/RebalanceUtil.java
 ##
 @@ -136,4 +141,13 @@
 result[1] = slaveStateValue;
 return result;
   }
+
+  public static void scheduleInstantPipeline(String clusterName) {
+GenericHelixController controller = 
GenericHelixController.getController(clusterName);
+if (controller != null) {
+  controller.scheduleInstantRebalance();
+} else {
+  LOG.warn("Failed to issue a pipeline run for cluster {}.", clusterName);
 
 Review comment:
   Can you make the log message clear? Under what circumstances would 
controller be null?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] narendly commented on a change in pull request #472: Change the way Helix triggers rebalance

2019-09-13 Thread GitBox
narendly commented on a change in pull request #472: Change the way Helix 
triggers rebalance
URL: https://github.com/apache/helix/pull/472#discussion_r324266592
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/RebalanceScheduler.java
 ##
 @@ -124,15 +125,17 @@ public RebalanceInvoker(HelixManager manager, String 
resource) {
 
 @Override
 public void run() {
-  invokeRebalance(_manager.getHelixDataAccessor(), _resource);
+  RebalanceUtil.scheduleInstantPipeline(_manager.getClusterName());
 }
   }
 
   /**
* Trigger the controller to perform rebalance for a given resource.
+   * This function is deprecated. Please use 
RebalanceUtil.scheduleInstantPipeline method instead.
 
 Review comment:
   Put this at the top of the comment?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] narendly commented on a change in pull request #472: Change the way Helix triggers rebalance

2019-09-13 Thread GitBox
narendly commented on a change in pull request #472: Change the way Helix 
triggers rebalance
URL: https://github.com/apache/helix/pull/472#discussion_r324265733
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##
 @@ -357,6 +357,30 @@ public void scheduleRebalance(long rebalanceTime) {
 }
   }
 
+  /**
+   * schedule an instant rebalance pipeline.
+   */
+  public void scheduleInstantRebalance() {
+if (_helixManager == null) {
+  logger.warn("Failed to schedule a future pipeline run for cluster " + 
_clusterName
+  + " helix manager is null!");
+  return;
+}
+if (_onDemandRebalanceTimer == null) {
+  _onDemandRebalanceTimer = new Timer(true);
+}
+
+RebalanceTask newTask = new RebalanceTask(_helixManager, 
ClusterEventType.OnDemandRebalance);
+
+_onDemandRebalanceTimer.schedule(newTask, 0);
+logger.info("Scheduled instant pipeline run for cluster " + 
_helixManager.getClusterName());
 
 Review comment:
   Nit: use {}


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] jiajunwang merged pull request #458: Selectively update cache and notify for every change in CurrentStates based RoutingTableProvider.

2019-09-13 Thread GitBox
jiajunwang merged pull request #458: Selectively update cache and notify for 
every change in CurrentStates based RoutingTableProvider.
URL: https://github.com/apache/helix/pull/458
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] pkuwm commented on issue #458: Selectively update cache and notify for every change in CurrentStates based RoutingTableProvider.

2019-09-13 Thread GitBox
pkuwm commented on issue #458: Selectively update cache and notify for every 
change in CurrentStates based RoutingTableProvider.
URL: https://github.com/apache/helix/pull/458#issuecomment-531426451
 
 
   Final comment for the commit:
   title: 
   Fix missed callbacks in CurrentStates based RoutingTableProvider.
   description: 
   1. Update BasicClusterDataCache to do refresh with selective update. Only 
when a change happens, we do the cache refresh only for that change type.
   2. Improve RoutingTableProvider.queueEvent() and 
RoutingTableProvider.handleEvent(). Return instanceConfigs snapshot to callback 
immediately, instead of waiting for currentStates completion.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] jiajunwang commented on a change in pull request #474: PartitionMovementSoftConstraint Implementation

2019-09-13 Thread GitBox
jiajunwang commented on a change in pull request #474: 
PartitionMovementSoftConstraint Implementation
URL: https://github.com/apache/helix/pull/474#discussion_r324396502
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/PartitionMovementConstraint.java
 ##
 @@ -0,0 +1,95 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.Collections;
+import java.util.Map;
+
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+import org.apache.helix.model.Partition;
+import org.apache.helix.model.ResourceAssignment;
+
+/**
+ * Evaluate the proposed assignment according to the potential partition 
movements cost.
+ * The cost is evaluated based on the difference between the old assignment 
and the new assignment.
+ * In detail, we consider the following two previous assignments as the base.
+ * - Baseline assignment that is calculated regardless of the node state 
(online/offline).
+ * - Previous Best Possible assignment.
+ * Any change to these two assignments will increase the partition movements 
cost, so that the
+ * evaluated score will become lower.
+ */
+class PartitionMovementConstraint extends SoftConstraint {
+  private static final float MAX_SCORE = 1f;
+  private static final float MIN_SCORE = 0f;
+  // This factor indicates the default score that is evaluated if only 
partition allocation matches
 
 Review comment:
   Please add TODO here, these 2 numbers also need to be tuned.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] jiajunwang commented on a change in pull request #474: PartitionMovementSoftConstraint Implementation

2019-09-13 Thread GitBox
jiajunwang commented on a change in pull request #474: 
PartitionMovementSoftConstraint Implementation
URL: https://github.com/apache/helix/pull/474#discussion_r324396242
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/ClusterContext.java
 ##
 @@ -68,6 +74,16 @@
 
 _estimatedMaxPartitionCount = estimateAvgReplicaCount(totalReplicas, 
instanceCount);
 _estimatedMaxTopStateCount = 
estimateAvgReplicaCount(totalTopStateReplicas, instanceCount);
+_baselineAssignment = baselineAssignment;
+_bestPossibleAssignment = bestPossibleAssignment;
+  }
+
+  public Map getBaselineAssignment() {
+return _baselineAssignment;
 
 Review comment:
   Safer to ensure we always return an empty map if the _baselineAssignment is 
null.
   I know it is not possible with the current code. Just to be safe.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] jiajunwang commented on a change in pull request #474: PartitionMovementSoftConstraint Implementation

2019-09-13 Thread GitBox
jiajunwang commented on a change in pull request #474: 
PartitionMovementSoftConstraint Implementation
URL: https://github.com/apache/helix/pull/474#discussion_r324396260
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/ClusterContext.java
 ##
 @@ -68,6 +74,16 @@
 
 _estimatedMaxPartitionCount = estimateAvgReplicaCount(totalReplicas, 
instanceCount);
 _estimatedMaxTopStateCount = 
estimateAvgReplicaCount(totalTopStateReplicas, instanceCount);
+_baselineAssignment = baselineAssignment;
+_bestPossibleAssignment = bestPossibleAssignment;
+  }
+
+  public Map getBaselineAssignment() {
+return _baselineAssignment;
+  }
+
+  public Map getBestPossibleAssignment() {
+return _bestPossibleAssignment;
 
 Review comment:
   Same here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] jiajunwang commented on a change in pull request #474: PartitionMovementSoftConstraint Implementation

2019-09-13 Thread GitBox
jiajunwang commented on a change in pull request #474: 
PartitionMovementSoftConstraint Implementation
URL: https://github.com/apache/helix/pull/474#discussion_r324396615
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/PartitionMovementConstraint.java
 ##
 @@ -0,0 +1,95 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import java.util.Collections;
+import java.util.Map;
+
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+import org.apache.helix.model.Partition;
+import org.apache.helix.model.ResourceAssignment;
+
+/**
+ * Evaluate the proposed assignment according to the potential partition 
movements cost.
+ * The cost is evaluated based on the difference between the old assignment 
and the new assignment.
+ * In detail, we consider the following two previous assignments as the base.
+ * - Baseline assignment that is calculated regardless of the node state 
(online/offline).
+ * - Previous Best Possible assignment.
+ * Any change to these two assignments will increase the partition movements 
cost, so that the
+ * evaluated score will become lower.
+ */
+class PartitionMovementConstraint extends SoftConstraint {
+  private static final float MAX_SCORE = 1f;
+  private static final float MIN_SCORE = 0f;
+  // This factor indicates the default score that is evaluated if only 
partition allocation matches
+  // (states are different).
+  private static final float ALLOCATION_MATCH_FACTOR = 0.5f;
+  // This factor indicates the contribution of the Baseline assignment 
matching to the final score.
+  private static final float BASELINE_MATCH_FACTOR = 0.25f;
+
+  PartitionMovementConstraint() {
+super(MAX_SCORE, MIN_SCORE);
+  }
+
+  @Override
+  protected float getAssignmentScore(AssignableNode node, AssignableReplica 
replica,
+  ClusterContext clusterContext) {
+Map bestPossibleStateMap =
+getStateMap(replica, clusterContext.getBestPossibleAssignment());
+Map baselineStateMap =
+getStateMap(replica, clusterContext.getBaselineAssignment());
+
+// Prioritize the matching of the previous Best Possible assignment.
+float scale = calculateAssignmentScale(node, replica, 
bestPossibleStateMap);
+// If the baseline is also provided, adjust the final score accordingly.
+scale = scale * (1 - BASELINE_MATCH_FACTOR)
 
 Review comment:
   If the baselineStateMap is empty, this calculation is not necessary.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] jiajunwang commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.

2019-09-13 Thread GitBox
jiajunwang commented on a change in pull request #466: Integrate the WAGED 
rebalancer with all the related components.
URL: https://github.com/apache/helix/pull/466#discussion_r324373074
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedRebalancer.java
 ##
 @@ -349,4 +338,61 @@ private IdealState 
generateIdealStateWithAssignment(String resourceName,
 }
 return preferenceList;
   }
+
+  private Map getBaseline(
+  AssignmentMetadataStore assignmentMetadataStore, CurrentStateOutput 
currentStateOutput,
+  Set resources) throws HelixRebalanceException {
+Map currentBaseline;
+try {
+  currentBaseline = assignmentMetadataStore.getBaseline();
+} catch (HelixException hex) {
+  LOG.warn("Failed to get the current baseline assignment. Use the current 
states instead",
+  hex);
+  currentBaseline = getCurrentStateAssingment(currentStateOutput, 
resources);
+} catch (Exception ex) {
+  throw new HelixRebalanceException(
+  "Failed to get the current baseline assignment because of unexpected 
error.",
+  HelixRebalanceException.Type.INVALID_REBALANCER_STATUS, ex);
+}
+return currentBaseline;
+  }
+
+  private Map getBestPossibleAssignment(
+  AssignmentMetadataStore assignmentMetadataStore, CurrentStateOutput 
currentStateOutput,
+  Set resources) throws HelixRebalanceException {
+Map currentBestAssignment;
+try {
+  // TODO fix the deserialize
 
 Review comment:
   The issus has been fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] jiajunwang commented on issue #466: Integrate the WAGED rebalancer with all the related components.

2019-09-13 Thread GitBox
jiajunwang commented on issue #466: Integrate the WAGED rebalancer with all the 
related components.
URL: https://github.com/apache/helix/pull/466#issuecomment-531395357
 
 
   > If the constraint factory/wagedRebalancer fails to produce the mapping, 
should we put the cluster in maintenance mode or treat it as a NOP? I'd like to 
guide us to think about in what scenarios putting the cluster under maintenance 
would be appropriate.
   
   The current design returns the previous mapping if a calculation fails (this 
is not done yet). Entering maintenance mode means we need to auto exist as 
well. This makes maintenance mode management more complicated. So I would 
prefer keeping it simple for now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] i3wangyi opened a new pull request #474: PartitionMovementSoftConstraint Implementation

2019-09-13 Thread GitBox
i3wangyi opened a new pull request #474: PartitionMovementSoftConstraint 
Implementation
URL: https://github.com/apache/helix/pull/474
 
 
   ### Issues
   
   - [ ] My PR addresses the following Helix issues and references them in the 
PR description:
   
   (#200 - Link your issue number here: You can write "Fixes #XXX")
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   (Write a concise description including what, why, how)
   
   ### Tests
   
   - [ ] The following tests are written for this issue:
   
   (List the names of added unit/integration tests)
   
   - [ ] The following is the result of the "mvn test" command on the 
appropriate module:
   
   (Copy & paste the result of "mvn test")
   
   ### Commits
   
   - [ ] My commits all reference appropriate Apache Helix GitHub issues in 
their subject lines, and I have squashed multiple commits if they address the 
same issue. In addition, my commits follow the guidelines from "[How to write a 
good git commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation in the 
following wiki page:
   
   (Link the GitHub wiki you added)
   
   ### Code Quality
   
   - [ ] My diff has been formatted using helix-style.xml


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] mcvsubbu opened a new issue #473: Helix 0.9.1 causes intermittent failures in pinot integration tests

2019-09-13 Thread GitBox
mcvsubbu opened a new issue #473: Helix 0.9.1 causes intermittent failures in 
pinot integration tests
URL: https://github.com/apache/helix/issues/473
 
 
   helix-0.9.1 causes intermittent failures in pinot integration tests. The 
failures are mostly timeouts because most pinot integration tests end up 
starting up a pinot cluster (controller, server, broker), creating some tables 
(helix resources), uploading segments (helix partitions on the tables created), 
and awaiting for all those partitions to be ONLINE.
   
   Intermittently, tests time-out waiting for this setup  step (before running 
the actual tests).
   
   Here is one way to reproduce one of the failures fairly quickly (at least, 
on my Linux desktop):
   
   Download the patch file attached as /tmp/add-logs.patch, and execute the 
following commands:
   
   ```
   PATCH_FILE="/tmp/add-logs.patch"
   git clone https://github.com/apache/incubator-pinot.git
   git checkout try_hotfix_helix
   
   mvn clean install -Pbuild-shaded-jar -DskipTests
   cd pinot-controller
   patch -p2 < ${PATCH_FILE}
   while [ true ]; do echo Starting at `date`; mvn test 
-Dtest=ControllerInstanceToggleTest 1> ~/fail.local.logs 2>&1; if [ $? -ne 0 ]; 
then break; fi; done; echo "FAILED. See ~/fail.local.logs"
   ```
   
   The patch is as follows:
   
   ```
   diff --git 
a/pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotInstanceRestletResource.java
 
b/pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotInstanceRestletResource.java
   index 79788c1f7..b52a0a6a5 100644
   --- 
a/pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotInstanceRestletResource.java
   +++ 
b/pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotInstanceRestletResource.java
   @@ -129,14 +129,17 @@ public class PinotInstanceRestletResource {
  Response.Status.NOT_FOUND);
}

   +PinotResourceManagerResponse resp;
if (StateType.ENABLE.name().equalsIgnoreCase(state)) {
   -  if 
(!pinotHelixResourceManager.enableInstance(instanceName).isSuccessful()) {
   -throw new ControllerApplicationException(LOGGER, "Failed to enable 
instance " + instanceName,
   +  resp = pinotHelixResourceManager.enableInstance(instanceName);
   +  if (!resp.isSuccessful()) {
   +throw new ControllerApplicationException(LOGGER, "Failed to enable 
instance " + instanceName + ":" + resp.getMessage(),
Response.Status.INTERNAL_SERVER_ERROR);
  }
} else if (StateType.DISABLE.name().equalsIgnoreCase(state)) {
   -  if 
(!pinotHelixResourceManager.disableInstance(instanceName).isSuccessful()) {
   -throw new ControllerApplicationException(LOGGER, "Failed to disable 
instance " + instanceName,
   +  resp = pinotHelixResourceManager.disableInstance(instanceName);
   +  if (!resp.isSuccessful()) {
   +throw new ControllerApplicationException(LOGGER, "Failed to disable 
instance " + instanceName + ":" + resp.getMessage(),
Response.Status.INTERNAL_SERVER_ERROR);
  }
} else if (StateType.DROP.name().equalsIgnoreCase(state)) {
   diff --git 
a/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
 
b/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
   index 0ceb65f47..4bc7fcfad 100644
   --- 
a/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
   +++ 
b/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
   @@ -2170,7 +2170,7 @@ public class PinotHelixResourceManager {
  PropertyKey liveInstanceKey = _keyBuilder.liveInstance(instanceName);
  LiveInstance liveInstance = 
_helixDataAccessor.getProperty(liveInstanceKey);
  if (liveInstance == null) {
   -return toggle ? PinotResourceManagerResponse.FAILURE : 
PinotResourceManagerResponse.SUCCESS;
   +return toggle ? PinotResourceManagerResponse.failure("liveInstance 
null for " + instanceName) : PinotResourceManagerResponse.SUCCESS;
  }
  PropertyKey instanceCurrentStatesKey = 
_keyBuilder.currentStates(instanceName, liveInstance.getSessionId());
  List instanceCurrentStates = 
_helixDataAccessor.getChildValues(instanceCurrentStatesKey);
   @@ -2195,7 +2195,7 @@ public class PinotHelixResourceManager {
}
  }
}
   -return PinotResourceManagerResponse.failure("Instance enable/disable 
failed, timeout");
   +return PinotResourceManagerResponse.failure("Instance enable/disable 
failed, timeout for instance " + instanceName);
  }

  @Nonnull
   diff --git 
a/pinot-controller/src/test/java/org/apache/pinot/controller/helix/ControllerInstanceToggleTest.java
 

[GitHub] [helix] i3wangyi commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.

2019-09-13 Thread GitBox
i3wangyi commented on a change in pull request #466: Integrate the WAGED 
rebalancer with all the related components.
URL: https://github.com/apache/helix/pull/466#discussion_r324314076
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedRebalancer.java
 ##
 @@ -241,21 +232,18 @@ private void 
refreshBaseline(ResourceControllerDataProvider clusterData,
 
   private Map partialRebalance(
   ResourceControllerDataProvider clusterData,
-  Map> clusterChanges, Map resourceMap)
-  throws HelixRebalanceException {
+  Map> clusterChanges, Map resourceMap,
+  final CurrentStateOutput currentStateOutput) throws 
HelixRebalanceException {
 
 Review comment:
   I see the TODO, however, the parameter is passed along from method A -> B -> 
C -> D, it's actually quite difficult to follow 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] i3wangyi commented on a change in pull request #464: Add soft constraint: ResourcePartitionAntiAffinityConstraint

2019-09-13 Thread GitBox
i3wangyi commented on a change in pull request #464: Add soft constraint: 
ResourcePartitionAntiAffinityConstraint
URL: https://github.com/apache/helix/pull/464#discussion_r324312068
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ResourcePartitionAntiAffinityConstraint.java
 ##
 @@ -0,0 +1,44 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+
+/**
+ * Evaluate the proposed assignment according to the other partitions 
assignment within the scope of
+ * the same resource.
+ * If the resource of the partition overall has a light load on the instance, 
the score is higher compared to
+ * case when the resource is heavily loaded on the instance
+ */
+public class ResourcePartitionAntiAffinityConstraint extends SoftConstraint {
+
+  @Override
+  protected float getAssignmentScore(AssignableNode node, AssignableReplica 
replica,
+  ClusterContext clusterContext) {
+String resource = replica.getResourceName();
+int curPartitionCountForResource = 
node.getAssignedPartitionsByResource(resource).size();
+int doubleMaxPartitionCountForResource =
+2 * clusterContext.getEstimatedMaxPartitionByResource(resource);
+return Math.max(((float) doubleMaxPartitionCountForResource - 
curPartitionCountForResource)
 
 Review comment:
   Updated the min/max values.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] i3wangyi commented on a change in pull request #464: Add soft constraint: ResourcePartitionAntiAffinityConstraint

2019-09-13 Thread GitBox
i3wangyi commented on a change in pull request #464: Add soft constraint: 
ResourcePartitionAntiAffinityConstraint
URL: https://github.com/apache/helix/pull/464#discussion_r324311995
 
 

 ##
 File path: 
helix-core/src/test/java/org/apache/helix/controller/rebalancer/waged/constraints/TestResourcePartitionAntiAffinityConstraint.java
 ##
 @@ -0,0 +1,63 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import static org.mockito.Mockito.when;
+
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+import org.mockito.Mockito;
+import org.testng.Assert;
+import org.testng.annotations.Test;
+
+import com.google.common.collect.ImmutableSet;
+
+import java.util.Collections;
+
+public class TestResourcePartitionAntiAffinityConstraint {
 
 Review comment:
   Sure, sounds good!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] i3wangyi commented on a change in pull request #464: Add soft constraint: ResourcePartitionAntiAffinityConstraint

2019-09-13 Thread GitBox
i3wangyi commented on a change in pull request #464: Add soft constraint: 
ResourcePartitionAntiAffinityConstraint
URL: https://github.com/apache/helix/pull/464#discussion_r324310252
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/constraints/ResourcePartitionAntiAffinityConstraint.java
 ##
 @@ -0,0 +1,44 @@
+package org.apache.helix.controller.rebalancer.waged.constraints;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.controller.rebalancer.waged.model.AssignableNode;
+import org.apache.helix.controller.rebalancer.waged.model.AssignableReplica;
+import org.apache.helix.controller.rebalancer.waged.model.ClusterContext;
+
+/**
+ * Evaluate the proposed assignment according to the other partitions 
assignment within the scope of
 
 Review comment:
   Done. Much clearer!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] jiajunwang commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.

2019-09-13 Thread GitBox
jiajunwang commented on a change in pull request #466: Integrate the WAGED 
rebalancer with all the related components.
URL: https://github.com/apache/helix/pull/466#discussion_r324309733
 
 

 ##
 File path: 
helix-core/src/test/java/org/apache/helix/controller/rebalancer/waged/model/AbstractTestClusterModel.java
 ##
 @@ -105,6 +105,7 @@ protected ResourceControllerDataProvider 
setupClusterDataCache() throws IOExcept
 testClusterConfig.setDisabledInstances(Collections.emptyMap());
 testClusterConfig.setTopologyAwareEnabled(false);
 testClusterConfig.setInstanceCapacityKeys(new 
ArrayList<>(_capacityDataMap.keySet()));
+testClusterConfig.setTopologyAwareEnabled(true);
 
 Review comment:
   Good catch. The rebalance used to not support it. Now we should test with 
more complicated logic.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] jiajunwang commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.

2019-09-13 Thread GitBox
jiajunwang commented on a change in pull request #466: Integrate the WAGED 
rebalancer with all the related components.
URL: https://github.com/apache/helix/pull/466#discussion_r324308536
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedRebalancer.java
 ##
 @@ -349,4 +333,82 @@ private IdealState 
generateIdealStateWithAssignment(String resourceName,
 }
 return preferenceList;
   }
+
+  /**
+   * @param assignmentMetadataStore
+   * @param currentStateOutput
+   * @param resources
+   * @return The current baseline assignment. If record does not exist in the
+   * assignmentMetadataStore, return the current state assignment.
+   * @throws HelixRebalanceException
+   */
+  private Map getBaselineAssignment(
+  AssignmentMetadataStore assignmentMetadataStore, CurrentStateOutput 
currentStateOutput,
+  Set resources) throws HelixRebalanceException {
+Map currentBaseline;
+try {
+  currentBaseline = assignmentMetadataStore.getBaseline();
+  if (currentBaseline.isEmpty()) {
+LOG.warn(
+"The current baseline assignment record is empty. Use the current 
states instead.");
 
 Review comment:
   Make sense, but the event id is in the stage class only. Event object is not 
passed to the rebalancer.
   Let's revisit this after the major logic is done.
   
   This point applies to all the other rebalancers as well. I think.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] jiajunwang commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.

2019-09-13 Thread GitBox
jiajunwang commented on a change in pull request #466: Integrate the WAGED 
rebalancer with all the related components.
URL: https://github.com/apache/helix/pull/466#discussion_r324303760
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/HelixRebalanceException.java
 ##
 @@ -27,6 +27,7 @@
 INVALID_CLUSTER_STATUS,
 INVALID_REBALANCER_STATUS,
 FAILED_TO_CALCULATE,
+INVALID_INPUT,
 
 Review comment:
   I agree. This shall be done when we finish the error reporting feature. The 
description will be fully used when it is reported in either log or some alerts.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] narendly commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.

2019-09-13 Thread GitBox
narendly commented on a change in pull request #466: Integrate the WAGED 
rebalancer with all the related components.
URL: https://github.com/apache/helix/pull/466#discussion_r324049647
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/HelixRebalanceException.java
 ##
 @@ -27,6 +27,7 @@
 INVALID_CLUSTER_STATUS,
 INVALID_REBALANCER_STATUS,
 FAILED_TO_CALCULATE,
+INVALID_INPUT,
 
 Review comment:
   I think it would be a good idea to actually add some description about each 
of these failure modes?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] narendly commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.

2019-09-13 Thread GitBox
narendly commented on a change in pull request #466: Integrate the WAGED 
rebalancer with all the related components.
URL: https://github.com/apache/helix/pull/466#discussion_r324051056
 
 

 ##
 File path: 
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedRebalancer.java
 ##
 @@ -118,27 +110,26 @@ protected WagedRebalancer(AssignmentMetadataStore 
assignmentMetadataStore,
   public Map 
computeNewIdealStates(ResourceControllerDataProvider clusterData,
   Map resourceMap, final CurrentStateOutput 
currentStateOutput)
   throws HelixRebalanceException {
-LOG.info("Start computing new ideal states for resources: {}", 
resourceMap.keySet().toString());
+if (resourceMap.isEmpty()) {
+  LOG.warn("There is no resource to be rebalanced by {}",
+  this.getClass().getSimpleName());
+  return Collections.emptyMap();
+}
 
-// Find the compatible resources: 1. FULL_AUTO 2. Configured to use the 
WAGED rebalancer
-resourceMap = resourceMap.entrySet().stream().filter(resourceEntry -> {
+LOG.info("Start computing new ideal states for resources: {}", 
resourceMap.keySet().toString());
+if (!resourceMap.entrySet().stream().allMatch(resourceEntry -> {
   IdealState is = clusterData.getIdealState(resourceEntry.getKey());
   return is != null && 
is.getRebalanceMode().equals(IdealState.RebalanceMode.FULL_AUTO)
   && getClass().getName().equals(is.getRebalancerClassName());
-}).collect(Collectors.toMap(resourceEntry -> resourceEntry.getKey(),
-resourceEntry -> resourceEntry.getValue()));
-
-if (resourceMap.isEmpty()) {
-  LOG.warn("There is no valid resource to be rebalanced by {}",
-  this.getClass().getSimpleName());
-  return Collections.emptyMap();
-} else {
-  LOG.info("Valid resources that will be rebalanced by {}: {}", 
this.getClass().getSimpleName(),
-  resourceMap.keySet().toString());
+})) {
+  throw new HelixRebalanceException(
+  "Input contains invalid resource(s) that cannot be rebalanced by the 
WAGED rebalancer.",
 
 Review comment:
   I think that the occurrence for this Exception would be low, so would it be 
a better idea to actually log the names of resources that cannot be rebalanced?
   
   From the operator's perspective, it would be very, very helpful. If I just 
saw this exception without the actual names of the resources, this error 
message is almost useless. Say you have 100 resources, and the rebalance failed 
due to 1 of them - then you would have to loop through all 100 of them just to 
find the problem child!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org



[GitHub] [helix] narendly commented on a change in pull request #466: Integrate the WAGED rebalancer with all the related components.

2019-09-13 Thread GitBox
narendly commented on a change in pull request #466: Integrate the WAGED 
rebalancer with all the related components.
URL: https://github.com/apache/helix/pull/466#discussion_r324050208
 
 

 ##
 File path: 
helix-core/src/test/java/org/apache/helix/controller/rebalancer/waged/model/AbstractTestClusterModel.java
 ##
 @@ -105,6 +105,7 @@ protected ResourceControllerDataProvider 
setupClusterDataCache() throws IOExcept
 testClusterConfig.setDisabledInstances(Collections.emptyMap());
 testClusterConfig.setTopologyAwareEnabled(false);
 testClusterConfig.setInstanceCapacityKeys(new 
ArrayList<>(_capacityDataMap.keySet()));
+testClusterConfig.setTopologyAwareEnabled(true);
 
 Review comment:
   Why do you set this field to false and back to true? Remove line 106?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org
For additional commands, e-mail: reviews-h...@helix.apache.org