[GitHub] helix pull request #91: [Helix-656] Support customize batch state transition...

2017-05-10 Thread kongweihan
Github user kongweihan commented on a diff in the pull request:

https://github.com/apache/helix/pull/91#discussion_r115862430
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
 ---
@@ -210,6 +213,18 @@ private void 
updateStateTransitionMessageThreadPool(Message message, HelixManage
   return;
 }
 
+if (!_batchMessageThreadpoolChecked) {
--- End diff --

It seems to me as long as there's any `STATE_TRANSITION` message, this pool 
will be created, so why not simply instantiate this thread pool in the 
constructor?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (HELIX-655) Helix per-participant concurrent task throttling

2017-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005448#comment-16005448
 ] 

ASF GitHub Bot commented on HELIX-655:
--

Github user kongweihan commented on a diff in the pull request:

https://github.com/apache/helix/pull/89#discussion_r115852400
  
--- Diff: helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java 
---
@@ -57,6 +57,7 @@
   new FixedTargetTaskAssignmentCalculator();
   private static TaskAssignmentCalculator genericTaskAssignmentCal =
   new GenericTaskAssignmentCalculator();
+  private static Map participantActiveTaskCount = new 
HashMap();
--- End diff --

This variable should start with `_`. I believe `genericTaskAssignmentCal` 
above was wrong.


> Helix per-participant concurrent task throttling
> 
>
> Key: HELIX-655
> URL: https://issues.apache.org/jira/browse/HELIX-655
> Project: Apache Helix
>  Issue Type: New Feature
>  Components: helix-core
>Affects Versions: 0.6.x
>Reporter: Jiajun Wang
>Assignee: Junkai Xue
>
> h1. Overview
> Currently, all runnable jobs/tasks in Helix are equally treated. They are all 
> scheduled according to the rebalancer algorithm. Means, their assignment 
> might be different, but they will all be in RUNNING state.
> This may cause an issue if there are too many concurrently runnable jobs. 
> When Helix controller starts all these jobs, the instances may be overload as 
> they are assigning resources and executing all the tasks. As a result, the 
> jobs won't be able to finish in a reasonable time window.
> The issue is even more critical to long run jobs. According to our meeting 
> with Gobblin team, when a job is scheduled, they allocate resource for the 
> job. So in the situation described above, more and more resources will be 
> reserved for the pending jobs. The cluster will soon be exhausted.
> For solving the problem, an application needs to schedule jobs in a 
> relatively low frequency (what Gobblin is doing now). This may cause low 
> utilization.
> A better way to fix this issue, at framework level, is throttling jobs/tasks 
> that are running concurrently, and allowing setting priority for different 
> jobs to control total execute time.
> So given same amount of jobs, the cluster is in a better condition. As a 
> result, jobs running in that cluster have a more controllable execute time.
> Existing related control mechanisms are:
> * ConcurrentTasksPerInstance for each job
> * ParallelJobs for each workflow
> * Threadpool limitation on the participant if user customizes 
> TaskStateModelFactory.
> But none of them can directly help when concurrent workflows or jobs number 
> is large. If an application keeps scheduling jobs/jobQueues, Helix will start 
> any runnable jobs without considering the workload on the participants.
> The application may be able to carefully configures these items to achieve 
> the goal. But they won't be able to easily find the sweet spot. Especially 
> the cluster might be changing (scale out etc.).
> h2. Problem summary
> # All runnable tasks will start executing, which may overload the participant.
> # Application needs a mechanism to prioritize important jobs (or workflows). 
> Otherwise, important tasks may be blocked by other less important ones. And 
> allocated resource is wasted.
> h2. Feature proposed
> Based on our discussing, we proposed 2 features that can help to resolve the 
> issue.
> # Running task throttling on each participant. This is for avoiding overload.
> # Job priority control that ensures high priority jobs are scheduled earlier.
> In addition, application can leverage workflow/job monitor items as feedback 
> from Helix to adjust their stretgy.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] helix pull request #89: [HELIX-655] Helix per-participant concurrent task th...

2017-05-10 Thread kongweihan
Github user kongweihan commented on a diff in the pull request:

https://github.com/apache/helix/pull/89#discussion_r115829596
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
 ---
@@ -204,4 +236,28 @@ private MappingCalculator 
getMappingCalculator(Rebalancer rebalancer, String res
 
 return mappingCalculator;
   }
+
+  class JobResourcePriority implements Comparable {
--- End diff --

I would name this differently. This object is not a "priority", but a job 
resource _with_ priority. Maybe `JobResourceWithPriority`, 
`ComparableJobResource` or just `JobResource` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (HELIX-655) Helix per-participant concurrent task throttling

2017-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005449#comment-16005449
 ] 

ASF GitHub Bot commented on HELIX-655:
--

Github user kongweihan commented on a diff in the pull request:

https://github.com/apache/helix/pull/89#discussion_r115830770
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
 ---
@@ -90,60 +96,86 @@ private BestPossibleStateOutput compute(ClusterEvent 
event, Map Helix per-participant concurrent task throttling
> 
>
> Key: HELIX-655
> URL: https://issues.apache.org/jira/browse/HELIX-655
> Project: Apache Helix
>  Issue Type: New Feature
>  Components: helix-core
>Affects Versions: 0.6.x
>Reporter: Jiajun Wang
>Assignee: Junkai Xue
>
> h1. Overview
> Currently, all runnable jobs/tasks in Helix are equally treated. They are all 
> scheduled according to the rebalancer algorithm. Means, their assignment 
> might be different, but they will all be in RUNNING state.
> This may cause an issue if there are too many concurrently runnable jobs. 
> When Helix controller starts all these jobs, the instances may be overload as 
> they are assigning resources and executing all the tasks. As a result, the 
> jobs won't be able to finish in a reasonable time window.
> The issue is even more critical to long run jobs. According to our meeting 
> with Gobblin team, when a job is scheduled, they allocate resource for the 
> job. So in the situation described above, more and more resources will be 
> reserved for the pending jobs. The cluster will soon be exhausted.
> For solving the problem, an application needs to schedule jobs in a 
> relatively low frequency (what Gobblin is doing now). This may cause low 
> utilization.
> A better way to fix this issue, at framework level, is throttling jobs/tasks 
> that are running concurrently, and allowing setting priority for different 
> jobs to control total execute time.
> So given same amount of jobs, the cluster is in a better condition. As a 
> result, jobs running in that cluster have a more controllable execute time.
> Existing related control mechanisms are:
> * ConcurrentTasksPerInstance for each job
> * ParallelJobs for each workflow
> * Threadpool limitation on the participant if user customizes 
> TaskStateModelFactory.
> But none of them can directly help when concurrent workflows or jobs number 
> is large. If an application keeps scheduling jobs/jobQueues, Helix will start 
> any runnable jobs without considering the workload on the participants.
> The application may be able to carefully configures these items to achieve 
> the goal. But they won't be able to easily find the sweet spot. Especially 
> the cluster might be changing (scale out etc.).
> h2. Problem summary
> # All runnable tasks will start executing, which may overload the participant.
> # Application needs a mechanism to prioritize important jobs (or workflows). 
> Otherwise, important tasks may be blocked by other less important ones. And 
> allocated resource is wasted.
> h2. Feature proposed
> Based on our discussing, we proposed 2 features that can help to resolve the 
> issue.
> # Running task throttling on each participant. This is for avoiding overload.
> # Job priority control that ensures high priority jobs are scheduled earlier.
> In addition, application can leverage workflow/job monitor items as feedback 
> from Helix to adjust their stretgy.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] helix pull request #89: [HELIX-655] Helix per-participant concurrent task th...

2017-05-10 Thread kongweihan
Github user kongweihan commented on a diff in the pull request:

https://github.com/apache/helix/pull/89#discussion_r115848245
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateOutput.java
 ---
@@ -260,13 +260,72 @@ public Message getPendingState(String resourceName, 
Partition partition, String
 return partitionSet;
   }
 
+  /**
+   * Get the partitions count for each participant with the pending state 
and given resource state model
+   * @param resourceStateModel specified resource state model to look up
+   * @param state specified pending resource state to look up
+   * @return set of participants to partitions mapping
+   */
+  public Map getPartitionCountWithPendingState(String 
resourceStateModel, String state) {
+Map pendingPartitionCount = new HashMap();
+for (String resource : _pendingStateMap.keySet()) {
+  String stateModel = _resourceStateModelMap.get(resource);
+  if (stateModel != null && stateModel.equals(resourceStateModel)
+  || stateModel == null && resourceStateModel == null) {
+for (Partition partition : 
_pendingStateMap.get(resource).keySet()) {
+  Map partitionMessage = 
_pendingStateMap.get(resource).get(partition);
+  for (Map.Entry participantMap : 
partitionMessage.entrySet()) {
+String participant = participantMap.getKey();
+if (!pendingPartitionCount.containsKey(participant)) {
+  pendingPartitionCount.put(participant, 0);
+}
+String toState = participantMap.getValue().getToState();
+if (toState != null && toState.equals(state) || toState == 
null && state == null) {
--- End diff --

Same here as mentioned. To me it's a bit confusing at first glance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] helix pull request #89: [HELIX-655] Helix per-participant concurrent task th...

2017-05-10 Thread kongweihan
Github user kongweihan commented on a diff in the pull request:

https://github.com/apache/helix/pull/89#discussion_r115849569
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateOutput.java
 ---
@@ -260,13 +260,72 @@ public Message getPendingState(String resourceName, 
Partition partition, String
 return partitionSet;
   }
 
+  /**
+   * Get the partitions count for each participant with the pending state 
and given resource state model
+   * @param resourceStateModel specified resource state model to look up
+   * @param state specified pending resource state to look up
+   * @return set of participants to partitions mapping
+   */
+  public Map getPartitionCountWithPendingState(String 
resourceStateModel, String state) {
+Map pendingPartitionCount = new HashMap();
+for (String resource : _pendingStateMap.keySet()) {
+  String stateModel = _resourceStateModelMap.get(resource);
+  if (stateModel != null && stateModel.equals(resourceStateModel)
+  || stateModel == null && resourceStateModel == null) {
+for (Partition partition : 
_pendingStateMap.get(resource).keySet()) {
+  Map partitionMessage = 
_pendingStateMap.get(resource).get(partition);
+  for (Map.Entry participantMap : 
partitionMessage.entrySet()) {
+String participant = participantMap.getKey();
+if (!pendingPartitionCount.containsKey(participant)) {
+  pendingPartitionCount.put(participant, 0);
+}
+String toState = participantMap.getValue().getToState();
+if (toState != null && toState.equals(state) || toState == 
null && state == null) {
+  pendingPartitionCount.put(participant, 
pendingPartitionCount.get(participant) + 1);
+}
+  }
+}
+  }
+}
+return pendingPartitionCount;
+  }
+
+  /**
+   * Get the partitions count for each participant in the current state 
and with given resource state model
+   * @param resourceStateModel specified resource state model to look up
+   * @param state specified current resource state to look up
+   * @return set of participants to partitions mapping
+   */
+  public Map getPartitionCountWithCurrentState(String 
resourceStateModel, String state) {
--- End diff --

This is similar with the above method, would it be better to combine them 
together?
I see that `_pendingStateMap` contains Messages instead of Strings, making 
it a bit hard to abstract. But look, its name is "_pendingStateMap", shouldn't 
it contain the pending state, instead of the message?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (HELIX-655) Helix per-participant concurrent task throttling

2017-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005451#comment-16005451
 ] 

ASF GitHub Bot commented on HELIX-655:
--

Github user kongweihan commented on a diff in the pull request:

https://github.com/apache/helix/pull/89#discussion_r115851381
  
--- Diff: helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java 
---
@@ -424,9 +425,15 @@ private ResourceAssignment 
computeResourceMapping(String jobResource,
 .contains(instance)) {
   continue;
 }
+// 1. throttled by job configuration
 // Contains the set of task partitions currently assigned to the 
instance.
 Set pSet = entry.getValue();
-int numToAssign = jobCfg.getNumConcurrentTasksPerInstance() - 
pSet.size();
+int jobCfgLimitation = jobCfg.getNumConcurrentTasksPerInstance() - 
pSet.size();
--- End diff --

Would it be better to make this small part a separate method?
`int numToAssign = getNumToAssign(...)`
This method is already too long.


> Helix per-participant concurrent task throttling
> 
>
> Key: HELIX-655
> URL: https://issues.apache.org/jira/browse/HELIX-655
> Project: Apache Helix
>  Issue Type: New Feature
>  Components: helix-core
>Affects Versions: 0.6.x
>Reporter: Jiajun Wang
>Assignee: Junkai Xue
>
> h1. Overview
> Currently, all runnable jobs/tasks in Helix are equally treated. They are all 
> scheduled according to the rebalancer algorithm. Means, their assignment 
> might be different, but they will all be in RUNNING state.
> This may cause an issue if there are too many concurrently runnable jobs. 
> When Helix controller starts all these jobs, the instances may be overload as 
> they are assigning resources and executing all the tasks. As a result, the 
> jobs won't be able to finish in a reasonable time window.
> The issue is even more critical to long run jobs. According to our meeting 
> with Gobblin team, when a job is scheduled, they allocate resource for the 
> job. So in the situation described above, more and more resources will be 
> reserved for the pending jobs. The cluster will soon be exhausted.
> For solving the problem, an application needs to schedule jobs in a 
> relatively low frequency (what Gobblin is doing now). This may cause low 
> utilization.
> A better way to fix this issue, at framework level, is throttling jobs/tasks 
> that are running concurrently, and allowing setting priority for different 
> jobs to control total execute time.
> So given same amount of jobs, the cluster is in a better condition. As a 
> result, jobs running in that cluster have a more controllable execute time.
> Existing related control mechanisms are:
> * ConcurrentTasksPerInstance for each job
> * ParallelJobs for each workflow
> * Threadpool limitation on the participant if user customizes 
> TaskStateModelFactory.
> But none of them can directly help when concurrent workflows or jobs number 
> is large. If an application keeps scheduling jobs/jobQueues, Helix will start 
> any runnable jobs without considering the workload on the participants.
> The application may be able to carefully configures these items to achieve 
> the goal. But they won't be able to easily find the sweet spot. Especially 
> the cluster might be changing (scale out etc.).
> h2. Problem summary
> # All runnable tasks will start executing, which may overload the participant.
> # Application needs a mechanism to prioritize important jobs (or workflows). 
> Otherwise, important tasks may be blocked by other less important ones. And 
> allocated resource is wasted.
> h2. Feature proposed
> Based on our discussing, we proposed 2 features that can help to resolve the 
> issue.
> # Running task throttling on each participant. This is for avoiding overload.
> # Job priority control that ensures high priority jobs are scheduled earlier.
> In addition, application can leverage workflow/job monitor items as feedback 
> from Helix to adjust their stretgy.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HELIX-655) Helix per-participant concurrent task throttling

2017-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005446#comment-16005446
 ] 

ASF GitHub Bot commented on HELIX-655:
--

Github user kongweihan commented on a diff in the pull request:

https://github.com/apache/helix/pull/89#discussion_r115849569
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateOutput.java
 ---
@@ -260,13 +260,72 @@ public Message getPendingState(String resourceName, 
Partition partition, String
 return partitionSet;
   }
 
+  /**
+   * Get the partitions count for each participant with the pending state 
and given resource state model
+   * @param resourceStateModel specified resource state model to look up
+   * @param state specified pending resource state to look up
+   * @return set of participants to partitions mapping
+   */
+  public Map getPartitionCountWithPendingState(String 
resourceStateModel, String state) {
+Map pendingPartitionCount = new HashMap();
+for (String resource : _pendingStateMap.keySet()) {
+  String stateModel = _resourceStateModelMap.get(resource);
+  if (stateModel != null && stateModel.equals(resourceStateModel)
+  || stateModel == null && resourceStateModel == null) {
+for (Partition partition : 
_pendingStateMap.get(resource).keySet()) {
+  Map partitionMessage = 
_pendingStateMap.get(resource).get(partition);
+  for (Map.Entry participantMap : 
partitionMessage.entrySet()) {
+String participant = participantMap.getKey();
+if (!pendingPartitionCount.containsKey(participant)) {
+  pendingPartitionCount.put(participant, 0);
+}
+String toState = participantMap.getValue().getToState();
+if (toState != null && toState.equals(state) || toState == 
null && state == null) {
+  pendingPartitionCount.put(participant, 
pendingPartitionCount.get(participant) + 1);
+}
+  }
+}
+  }
+}
+return pendingPartitionCount;
+  }
+
+  /**
+   * Get the partitions count for each participant in the current state 
and with given resource state model
+   * @param resourceStateModel specified resource state model to look up
+   * @param state specified current resource state to look up
+   * @return set of participants to partitions mapping
+   */
+  public Map getPartitionCountWithCurrentState(String 
resourceStateModel, String state) {
--- End diff --

This is similar with the above method, would it be better to combine them 
together?
I see that `_pendingStateMap` contains Messages instead of Strings, making 
it a bit hard to abstract. But look, its name is "_pendingStateMap", shouldn't 
it contain the pending state, instead of the message?


> Helix per-participant concurrent task throttling
> 
>
> Key: HELIX-655
> URL: https://issues.apache.org/jira/browse/HELIX-655
> Project: Apache Helix
>  Issue Type: New Feature
>  Components: helix-core
>Affects Versions: 0.6.x
>Reporter: Jiajun Wang
>Assignee: Junkai Xue
>
> h1. Overview
> Currently, all runnable jobs/tasks in Helix are equally treated. They are all 
> scheduled according to the rebalancer algorithm. Means, their assignment 
> might be different, but they will all be in RUNNING state.
> This may cause an issue if there are too many concurrently runnable jobs. 
> When Helix controller starts all these jobs, the instances may be overload as 
> they are assigning resources and executing all the tasks. As a result, the 
> jobs won't be able to finish in a reasonable time window.
> The issue is even more critical to long run jobs. According to our meeting 
> with Gobblin team, when a job is scheduled, they allocate resource for the 
> job. So in the situation described above, more and more resources will be 
> reserved for the pending jobs. The cluster will soon be exhausted.
> For solving the problem, an application needs to schedule jobs in a 
> relatively low frequency (what Gobblin is doing now). This may cause low 
> utilization.
> A better way to fix this issue, at framework level, is throttling jobs/tasks 
> that are running concurrently, and allowing setting priority for different 
> jobs to control total execute time.
> So given same amount of jobs, the cluster is in a better condition. As a 
> result, jobs running in that cluster have a more controllable execute time.
> Existing related control mechanisms are:
> * ConcurrentTasksPerInstance for each job
> * ParallelJobs for 

[jira] [Commented] (HELIX-655) Helix per-participant concurrent task throttling

2017-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005445#comment-16005445
 ] 

ASF GitHub Bot commented on HELIX-655:
--

Github user kongweihan commented on a diff in the pull request:

https://github.com/apache/helix/pull/89#discussion_r115829596
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
 ---
@@ -204,4 +236,28 @@ private MappingCalculator 
getMappingCalculator(Rebalancer rebalancer, String res
 
 return mappingCalculator;
   }
+
+  class JobResourcePriority implements Comparable {
--- End diff --

I would name this differently. This object is not a "priority", but a job 
resource _with_ priority. Maybe `JobResourceWithPriority`, 
`ComparableJobResource` or just `JobResource` ?


> Helix per-participant concurrent task throttling
> 
>
> Key: HELIX-655
> URL: https://issues.apache.org/jira/browse/HELIX-655
> Project: Apache Helix
>  Issue Type: New Feature
>  Components: helix-core
>Affects Versions: 0.6.x
>Reporter: Jiajun Wang
>Assignee: Junkai Xue
>
> h1. Overview
> Currently, all runnable jobs/tasks in Helix are equally treated. They are all 
> scheduled according to the rebalancer algorithm. Means, their assignment 
> might be different, but they will all be in RUNNING state.
> This may cause an issue if there are too many concurrently runnable jobs. 
> When Helix controller starts all these jobs, the instances may be overload as 
> they are assigning resources and executing all the tasks. As a result, the 
> jobs won't be able to finish in a reasonable time window.
> The issue is even more critical to long run jobs. According to our meeting 
> with Gobblin team, when a job is scheduled, they allocate resource for the 
> job. So in the situation described above, more and more resources will be 
> reserved for the pending jobs. The cluster will soon be exhausted.
> For solving the problem, an application needs to schedule jobs in a 
> relatively low frequency (what Gobblin is doing now). This may cause low 
> utilization.
> A better way to fix this issue, at framework level, is throttling jobs/tasks 
> that are running concurrently, and allowing setting priority for different 
> jobs to control total execute time.
> So given same amount of jobs, the cluster is in a better condition. As a 
> result, jobs running in that cluster have a more controllable execute time.
> Existing related control mechanisms are:
> * ConcurrentTasksPerInstance for each job
> * ParallelJobs for each workflow
> * Threadpool limitation on the participant if user customizes 
> TaskStateModelFactory.
> But none of them can directly help when concurrent workflows or jobs number 
> is large. If an application keeps scheduling jobs/jobQueues, Helix will start 
> any runnable jobs without considering the workload on the participants.
> The application may be able to carefully configures these items to achieve 
> the goal. But they won't be able to easily find the sweet spot. Especially 
> the cluster might be changing (scale out etc.).
> h2. Problem summary
> # All runnable tasks will start executing, which may overload the participant.
> # Application needs a mechanism to prioritize important jobs (or workflows). 
> Otherwise, important tasks may be blocked by other less important ones. And 
> allocated resource is wasted.
> h2. Feature proposed
> Based on our discussing, we proposed 2 features that can help to resolve the 
> issue.
> # Running task throttling on each participant. This is for avoiding overload.
> # Job priority control that ensures high priority jobs are scheduled earlier.
> In addition, application can leverage workflow/job monitor items as feedback 
> from Helix to adjust their stretgy.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] helix pull request #89: [HELIX-655] Helix per-participant concurrent task th...

2017-05-10 Thread kongweihan
Github user kongweihan commented on a diff in the pull request:

https://github.com/apache/helix/pull/89#discussion_r115853776
  
--- Diff: helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java 
---
@@ -704,4 +712,30 @@ private PartitionAssignment(String instance, String 
state) {
   _state = state;
 }
   }
+
+  /**
+   * Reset RUNNING/INIT tasks count in JobRebalancer
+   */
+  public static void resetActiveTaskCount(Collection 
liveInstances, CurrentStateOutput currentStateOutput) {
+// init participant map
+for (String liveInstance : liveInstances) {
+  participantActiveTaskCount.put(liveInstance, 0);
+}
+// Active task == init and running tasks
+
fillActiveTaskCount(currentStateOutput.getPartitionCountWithPendingState(TaskConstants.STATE_MODEL_NAME,
--- End diff --

Is there any case where `INIT` is the toState?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] helix pull request #89: [HELIX-655] Helix per-participant concurrent task th...

2017-05-10 Thread kongweihan
Github user kongweihan commented on a diff in the pull request:

https://github.com/apache/helix/pull/89#discussion_r115847749
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateOutput.java
 ---
@@ -260,13 +260,72 @@ public Message getPendingState(String resourceName, 
Partition partition, String
 return partitionSet;
   }
 
+  /**
+   * Get the partitions count for each participant with the pending state 
and given resource state model
+   * @param resourceStateModel specified resource state model to look up
+   * @param state specified pending resource state to look up
+   * @return set of participants to partitions mapping
+   */
+  public Map getPartitionCountWithPendingState(String 
resourceStateModel, String state) {
+Map pendingPartitionCount = new HashMap();
+for (String resource : _pendingStateMap.keySet()) {
+  String stateModel = _resourceStateModelMap.get(resource);
+  if (stateModel != null && stateModel.equals(resourceStateModel)
--- End diff --

IMHO it's generally better to group conditions together so that people 
don't need to google and confirm that "&& takes precedence over ||".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] helix pull request #89: [HELIX-655] Helix per-participant concurrent task th...

2017-05-10 Thread kongweihan
Github user kongweihan commented on a diff in the pull request:

https://github.com/apache/helix/pull/89#discussion_r115852400
  
--- Diff: helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java 
---
@@ -57,6 +57,7 @@
   new FixedTargetTaskAssignmentCalculator();
   private static TaskAssignmentCalculator genericTaskAssignmentCal =
   new GenericTaskAssignmentCalculator();
+  private static Map participantActiveTaskCount = new 
HashMap();
--- End diff --

This variable should start with `_`. I believe `genericTaskAssignmentCal` 
above was wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Generate Helix release 0.6.8

2017-05-10 Thread Xue Junkai
Yes. I have the PR for this : https://github.com/apache/helix/pull/91

Best,

Junkai

On Wed, May 10, 2017 at 12:08 PM, kishore g  wrote:

> Yes. Do you have a PR for that?. I can review it
>
> On Wed, May 10, 2017 at 11:19 AM, Xue Junkai  wrote:
>
> > Sure! Please let me know if this change works or not. BTW will customized
> > batch message threadpool be involved in this release?
> >
> > Best,
> >
> > Junkai
> >
> > On Tue, May 9, 2017 at 7:28 PM, kishore g  wrote:
> >
> > > I would like to have that fix included for Pinot. I will test it the
> > patch.
> > >
> > > On Tue, May 9, 2017 at 5:59 PM, Xue Junkai 
> wrote:
> > >
> > > > It does contain the batchMessage thread pool fix. But for race
> > condition
> > > > fix I withdraw the pull request since I am not quite sure whether the
> > fix
> > > > works or not. In addition, this release will include the
> > > > AutoRebalanceStrategy not assign replicas fix.
> > > >
> > > >
> > > > Best,
> > > >
> > > > Junkai
> > > >
> > > > On Tue, May 9, 2017 at 5:49 PM, kishore g 
> wrote:
> > > >
> > > > > Does this include the batchMessage thread pool fix and fix to the
> > race
> > > > > condition
> > > > >
> > > > > On Tue, May 9, 2017 at 5:08 PM, Xue Junkai 
> > > wrote:
> > > > >
> > > > > > Hi Helix Devs,
> > > > > >
> > > > > > I am going to work on releasing Helix 0.6.8 this week. Please let
> > me
> > > > know
> > > > > > if you have any questions, comments and concerns.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Junkai Xue
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Junkai Xue
> > > >
> > >
> >
> >
> >
> > --
> > Junkai Xue
> >
>



-- 
Junkai Xue


[GitHub] helix pull request #91: [Helix-656] Support customize batch state transition...

2017-05-10 Thread dasahcc
GitHub user dasahcc opened a pull request:

https://github.com/apache/helix/pull/91

[Helix-656] Support customize batch state transition thread pool

To better support batch message handling, we shall make batch state 
transition thread pool configurable. This config can be put in cluster config.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dasahcc/helix helix-0.6.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/91.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #91


commit 6a78f7de0cdb298a6dccbf9802eb63204e983211
Author: Junkai Xue 
Date:   2017-05-10T19:18:41Z

[Helix-656] Support customize batch state transition thread pool

To better support batch message handling, we shall make batch state 
transition thread pool configurable.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Generate Helix release 0.6.8

2017-05-10 Thread kishore g
Yes. Do you have a PR for that?. I can review it

On Wed, May 10, 2017 at 11:19 AM, Xue Junkai  wrote:

> Sure! Please let me know if this change works or not. BTW will customized
> batch message threadpool be involved in this release?
>
> Best,
>
> Junkai
>
> On Tue, May 9, 2017 at 7:28 PM, kishore g  wrote:
>
> > I would like to have that fix included for Pinot. I will test it the
> patch.
> >
> > On Tue, May 9, 2017 at 5:59 PM, Xue Junkai  wrote:
> >
> > > It does contain the batchMessage thread pool fix. But for race
> condition
> > > fix I withdraw the pull request since I am not quite sure whether the
> fix
> > > works or not. In addition, this release will include the
> > > AutoRebalanceStrategy not assign replicas fix.
> > >
> > >
> > > Best,
> > >
> > > Junkai
> > >
> > > On Tue, May 9, 2017 at 5:49 PM, kishore g  wrote:
> > >
> > > > Does this include the batchMessage thread pool fix and fix to the
> race
> > > > condition
> > > >
> > > > On Tue, May 9, 2017 at 5:08 PM, Xue Junkai 
> > wrote:
> > > >
> > > > > Hi Helix Devs,
> > > > >
> > > > > I am going to work on releasing Helix 0.6.8 this week. Please let
> me
> > > know
> > > > > if you have any questions, comments and concerns.
> > > > >
> > > > > Best,
> > > > >
> > > > > Junkai Xue
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Junkai Xue
> > >
> >
>
>
>
> --
> Junkai Xue
>


[GitHub] helix pull request #90: [HELIX-631] Fix AutoRebalanceStrategy replica not as...

2017-05-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/90


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Generate Helix release 0.6.8

2017-05-10 Thread Xue Junkai
Sure! Please let me know if this change works or not. BTW will customized
batch message threadpool be involved in this release?

Best,

Junkai

On Tue, May 9, 2017 at 7:28 PM, kishore g  wrote:

> I would like to have that fix included for Pinot. I will test it the patch.
>
> On Tue, May 9, 2017 at 5:59 PM, Xue Junkai  wrote:
>
> > It does contain the batchMessage thread pool fix. But for race condition
> > fix I withdraw the pull request since I am not quite sure whether the fix
> > works or not. In addition, this release will include the
> > AutoRebalanceStrategy not assign replicas fix.
> >
> >
> > Best,
> >
> > Junkai
> >
> > On Tue, May 9, 2017 at 5:49 PM, kishore g  wrote:
> >
> > > Does this include the batchMessage thread pool fix and fix to the race
> > > condition
> > >
> > > On Tue, May 9, 2017 at 5:08 PM, Xue Junkai 
> wrote:
> > >
> > > > Hi Helix Devs,
> > > >
> > > > I am going to work on releasing Helix 0.6.8 this week. Please let me
> > know
> > > > if you have any questions, comments and concerns.
> > > >
> > > > Best,
> > > >
> > > > Junkai Xue
> > > >
> > >
> >
> >
> >
> > --
> > Junkai Xue
> >
>



-- 
Junkai Xue