[jira] [Created] (HELIX-774) Helix process getting increased day by day
Mohanraj Tirougnaname created HELIX-774: --- Summary: Helix process getting increased day by day Key: HELIX-774 URL: https://issues.apache.org/jira/browse/HELIX-774 Project: Apache Helix Issue Type: Bug Components: helix-webapp-admin Affects Versions: 0.6.5 Environment: Linux Reporter: Mohanraj Tirougnaname Fix For: 0.6.5 Hi Team, We are using helix in cluster Load balancing. While starting jbpm server the 3 helix process daily getting added in the process and takes lot of memory. Below I have added the helix process, please help me to fix this. Please refer below process jboss 31027 1 0 Oct19 ? 00:04:27 /u01/app/mw/jdk1.8.0_121/bin/java -Xms512m -Xmx512m -classpath /u01/app/mw/prod_zk/helix/conf /u01/app/mw/prod_zk/heli /repo/log4j/log4j/1.2.15/log4j-1.2.15.jar /u01/app/mw/prod_zk/helix/repo/org/apache/zookeeper/zookeeper/3.3.4/zookeeper-3.3.4.jar /u01/app/mw/prod_zk/helix/repo/jline/jline/0.9.94/jline-0.9.94.jar /u01/app/mw/prod_zk/helix/repo/org/codehaus/jackson/jackson-core-asl/1.8.5/jackson-core-asl-1.8.5.jar /u01/app/mw/prod_zk/helix/repo/org/codehaus/jackson/jackson-mapper-asl/1.8.5/jackson-mapper-asl-1.8.5.jar /u01/app/mw/prod_zk/helix/repo/commons-io/commons-io/1.4/commons-io-1.4.jar /u01/app/mw/prod_zk/helix/repo/commons-cli/commons-cli/1.2/commons-cli-1.2.jar /u01/app/mw/prod_zk/helix/repo/com/github/sgroschupf/zkclient/0.1/zkclient-0.1.jar /u01/app/mw/prod_zk/helix/repo/org/apache/commons/commons-math/2.1/commons-math-2.1.jar /u01/app/mw/prod_zk/helix/repo/commons-codec/commons-codec/1.6/commons-codec-1.6.jar /u01/app/mw/prod_zk/helix/repo/com/google/guava/guava/15.0/guava-15.0.jar /u01/app/mw/prod_zk/helix/repo/org/yaml/snakeyaml/1.12/snakeyaml-1.12.jar /u01/app/mw/prod_zk/helix/repo/org/apache/helix/helix-core/0.6.5/helix-core-0.6.5.jar -Dapp.name=run-helix-controller -Dapp.pid=31027 -Dapp.repo=/u01/app/mw/prod_zk/helix/repo -Dbasedir=/u01/app/mw/prod_zk/helix org.apache.helix.controller.HelixControllerMain --zkSvr 204.26.160.42:4181,204.26.160.43:4181 --cluster repoCluster3 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-772) Support TaskDriver.addUserContent() api
[ https://issues.apache.org/jira/browse/HELIX-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670698#comment-16670698 ] ASF GitHub Bot commented on HELIX-772: -- Github user asfgit closed the pull request at: https://github.com/apache/helix/pull/280 > Support TaskDriver.addUserContent() api > --- > > Key: HELIX-772 > URL: https://issues.apache.org/jira/browse/HELIX-772 > Project: Apache Helix > Issue Type: Bug >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Need to support add user content in task driver > > AC: > * implement APi > * add test > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-773) Support getLastScheduledTaskTimestamp information in workflow rest api
[ https://issues.apache.org/jira/browse/HELIX-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670702#comment-16670702 ] ASF GitHub Bot commented on HELIX-773: -- Github user asfgit closed the pull request at: https://github.com/apache/helix/pull/281 > Support getLastScheduledTaskTimestamp information in workflow rest api > -- > > Key: HELIX-773 > URL: https://issues.apache.org/jira/browse/HELIX-773 > Project: Apache Helix > Issue Type: Bug >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Support getLastScheduledTaskTimestamp information in workflow rest api -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] helix pull request #281: [HELIX-773] add getLastScheduledTaskTimestamp infor...
Github user asfgit closed the pull request at: https://github.com/apache/helix/pull/281 ---
[GitHub] helix pull request #283: [HELIX-775] consolidate user content related apis f...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/283 [HELIX-775] consolidate user content related apis for task driver HELIX-1315: consolidate user content related apis for task driver To consolidate task driver user content related apis, and corresponding rest apis, I'm deprecating the general getUserContent() api, but instead, we now have the following apis for get / add / update user content. ```java public void addOrUpdateWorkflowUserContentMap(String workflowName, final Map contentToAddOrUpdate); public void addOrUpdateJobUserContentMap(String workflowName, String jobName, final Map contentToAddOrUpdate); public void addOrUpdateTaskUserContentMap(String workflowName, String jobName, String taskPartitionId, final Map contentToAddOrUpdate); public Map getWorkflowUserContentMap(String workflowName); public Map getJobUserContentMap(String workflowName, String jobName); public Map getTaskUserContentMap(String workflowName, String jobName, String taskPartitionId); ``` delete user content api tbd but can use the same convension You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/task-user-content Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/283.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #283 commit b235c4ee5a82c5970d29e839317ea242813a58bc Author: Harry Zhang Date: 2018-10-04T18:25:08Z [HELIX-775] consolidate user content related apis for task driver ---
[jira] [Commented] (HELIX-775) Task driver should support add/get task framework user content
[ https://issues.apache.org/jira/browse/HELIX-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670730#comment-16670730 ] ASF GitHub Bot commented on HELIX-775: -- GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/283 [HELIX-775] consolidate user content related apis for task driver HELIX-1315: consolidate user content related apis for task driver To consolidate task driver user content related apis, and corresponding rest apis, I'm deprecating the general getUserContent() api, but instead, we now have the following apis for get / add / update user content. ```java public void addOrUpdateWorkflowUserContentMap(String workflowName, final Map contentToAddOrUpdate); public void addOrUpdateJobUserContentMap(String workflowName, String jobName, final Map contentToAddOrUpdate); public void addOrUpdateTaskUserContentMap(String workflowName, String jobName, String taskPartitionId, final Map contentToAddOrUpdate); public Map getWorkflowUserContentMap(String workflowName); public Map getJobUserContentMap(String workflowName, String jobName); public Map getTaskUserContentMap(String workflowName, String jobName, String taskPartitionId); ``` delete user content api tbd but can use the same convension You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/task-user-content Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/283.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #283 commit b235c4ee5a82c5970d29e839317ea242813a58bc Author: Harry Zhang Date: 2018-10-04T18:25:08Z [HELIX-775] consolidate user content related apis for task driver > Task driver should support add/get task framework user content > -- > > Key: HELIX-775 > URL: https://issues.apache.org/jira/browse/HELIX-775 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Task driver should support add/get task framework user content at > workflow/job/task levels > > AC: > * finish implementation > * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-775) Task driver should support add/get task framework user content
[ https://issues.apache.org/jira/browse/HELIX-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670739#comment-16670739 ] Hudson commented on HELIX-775: -- FAILURE: Integrated in Jenkins build helix #1560 (See [https://builds.apache.org/job/helix/1560/]) [HELIX-775] consolidate user content related apis for task driver (hrzhang: rev b235c4ee5a82c5970d29e839317ea242813a58bc) * (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskDriver.java * (edit) helix-core/src/test/java/org/apache/helix/task/TestGetSetUserContentStore.java > Task driver should support add/get task framework user content > -- > > Key: HELIX-775 > URL: https://issues.apache.org/jira/browse/HELIX-775 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Task driver should support add/get task framework user content at > workflow/job/task levels > > AC: > * finish implementation > * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
helix - Build # 1560 - Still Failing
The Apache Jenkins build system has built helix (build #1560) Status: Still Failing Check console output at https://builds.apache.org/job/helix/1560/ to view the results.
[jira] [Commented] (HELIX-772) Support TaskDriver.addUserContent() api
[ https://issues.apache.org/jira/browse/HELIX-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670717#comment-16670717 ] Hudson commented on HELIX-772: -- FAILURE: Integrated in Jenkins build helix #1558 (See [https://builds.apache.org/job/helix/1558/]) [HELIX-772] add TaskDriver.addUserContent() api and related tests (hrzhang: rev 0c251bbf640206729755301c3dda734eea78343f) * (add) helix-core/src/test/java/org/apache/helix/task/TestGetSetUserContentStore.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestIndependentTaskRebalancer.java * (delete) helix-core/src/test/java/org/apache/helix/task/TestGetUserContentStore.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskDriver.java > Support TaskDriver.addUserContent() api > --- > > Key: HELIX-772 > URL: https://issues.apache.org/jira/browse/HELIX-772 > Project: Apache Helix > Issue Type: Bug >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Need to support add user content in task driver > > AC: > * implement APi > * add test > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-773) Support getLastScheduledTaskTimestamp information in workflow rest api
[ https://issues.apache.org/jira/browse/HELIX-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670718#comment-16670718 ] Hudson commented on HELIX-773: -- FAILURE: Integrated in Jenkins build helix #1558 (See [https://builds.apache.org/job/helix/1558/]) [HELIX-773] add getLastScheduledTaskTimestamp information in workflow (hrzhang: rev 566d4f166473b477ea0db1cfba5d04c8f3d6bf30) * (add) helix-core/src/test/java/org/apache/helix/task/TestGetLastScheduledTaskExecInfo.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/WorkflowAccessor.java * (delete) helix-core/src/test/java/org/apache/helix/task/TestGetLastScheduledTaskTimestamp.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskDriver.java * (add) helix-core/src/main/java/org/apache/helix/task/TaskExecutionInfo.java * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/TestWorkflowAccessor.java > Support getLastScheduledTaskTimestamp information in workflow rest api > -- > > Key: HELIX-773 > URL: https://issues.apache.org/jira/browse/HELIX-773 > Project: Apache Helix > Issue Type: Bug >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Support getLastScheduledTaskTimestamp information in workflow rest api -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] helix pull request #283: [HELIX-775] consolidate user content related apis f...
Github user asfgit closed the pull request at: https://github.com/apache/helix/pull/283 ---
[jira] [Created] (HELIX-776) REST2.0: Add delete command to updateInstanceConfig
Hunter L created HELIX-776: -- Summary: REST2.0: Add delete command to updateInstanceConfig Key: HELIX-776 URL: https://issues.apache.org/jira/browse/HELIX-776 Project: Apache Helix Issue Type: Improvement Reporter: Hunter L Assignee: Hunter L For instance configs, REST2.0 did not expose the REST API for deletion of fields. This RB adds update and delete commands to updateInstanceConfig and an integration test thereof. Changelist: 1. Add delete command to updateInstanceConfig in InstanceAccessor 2. Add integration tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HELIX-777) TASK: Handle null currentState for unscheduled tasks
Hunter L created HELIX-777: -- Summary: TASK: Handle null currentState for unscheduled tasks Key: HELIX-777 URL: https://issues.apache.org/jira/browse/HELIX-777 Project: Apache Helix Issue Type: Improvement Reporter: Hunter L Assignee: Hunter L It was observed that when a workflow is submitted and the Controller attempts to schedule its tasks, ZK read fails to read the appropriate job's context, causing the job to be stuck in an unscheduled state. The job remained unscheduled because it had no currentStates, and its job context did not contain any assignment/state information. This RB fixes such stuck states by detecting null currentStates. Changelist: 1. Check if currentState is null and if it is, manually assign an INIT state -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] helix pull request #282: [HELIX-775] add task driver support for helix rest ...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/282 [HELIX-775] add task driver support for helix rest to add/get task fr⦠â¦amework user content consolidate user content related apis for task driver To consolidate task driver user content related apis, and corresponding rest apis, I'm deprecating the general getUserContent() api, but instead, we now have the following apis for get / add / update user content. ```java public void addOrUpdateWorkflowUserContentMap(String workflowName, final Map contentToAddOrUpdate); public void addOrUpdateJobUserContentMap(String workflowName, String jobName, final Map contentToAddOrUpdate); public void addOrUpdateTaskUserContentMap(String workflowName, String jobName, String taskPartitionId, final Map contentToAddOrUpdate); public Map getWorkflowUserContentMap(String workflowName); public Map getJobUserContentMap(String workflowName, String jobName); public Map getTaskUserContentMap(String workflowName, String jobName, String taskPartitionId); ``` API for deleting user content is TBD but can use the same convension You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/task-user-content Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/282.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #282 commit 7ec5313bccb679014d6a0605ee5d7184063e555e Author: Harry Zhang Date: 2018-10-31T20:55:44Z [HELIX-775] add task driver support for helix rest to add/get task framework user content ---
[jira] [Commented] (HELIX-775) Task driver should support add/get task framework user content
[ https://issues.apache.org/jira/browse/HELIX-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670711#comment-16670711 ] ASF GitHub Bot commented on HELIX-775: -- GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/282 [HELIX-775] add task driver support for helix rest to add/get task fr… …amework user content consolidate user content related apis for task driver To consolidate task driver user content related apis, and corresponding rest apis, I'm deprecating the general getUserContent() api, but instead, we now have the following apis for get / add / update user content. ```java public void addOrUpdateWorkflowUserContentMap(String workflowName, final Map contentToAddOrUpdate); public void addOrUpdateJobUserContentMap(String workflowName, String jobName, final Map contentToAddOrUpdate); public void addOrUpdateTaskUserContentMap(String workflowName, String jobName, String taskPartitionId, final Map contentToAddOrUpdate); public Map getWorkflowUserContentMap(String workflowName); public Map getJobUserContentMap(String workflowName, String jobName); public Map getTaskUserContentMap(String workflowName, String jobName, String taskPartitionId); ``` API for deleting user content is TBD but can use the same convension You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/task-user-content Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/282.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #282 commit 7ec5313bccb679014d6a0605ee5d7184063e555e Author: Harry Zhang Date: 2018-10-31T20:55:44Z [HELIX-775] add task driver support for helix rest to add/get task framework user content > Task driver should support add/get task framework user content > -- > > Key: HELIX-775 > URL: https://issues.apache.org/jira/browse/HELIX-775 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Task driver should support add/get task framework user content at > workflow/job/task levels > > AC: > * finish implementation > * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
helix - Build # 1559 - Still Failing
The Apache Jenkins build system has built helix (build #1559) Status: Still Failing Check console output at https://builds.apache.org/job/helix/1559/ to view the results.
[jira] [Commented] (HELIX-775) Task driver should support add/get task framework user content
[ https://issues.apache.org/jira/browse/HELIX-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670734#comment-16670734 ] ASF GitHub Bot commented on HELIX-775: -- Github user asfgit closed the pull request at: https://github.com/apache/helix/pull/283 > Task driver should support add/get task framework user content > -- > > Key: HELIX-775 > URL: https://issues.apache.org/jira/browse/HELIX-775 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Task driver should support add/get task framework user content at > workflow/job/task levels > > AC: > * finish implementation > * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-775) Task driver should support add/get task framework user content
[ https://issues.apache.org/jira/browse/HELIX-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670714#comment-16670714 ] ASF GitHub Bot commented on HELIX-775: -- Github user asfgit closed the pull request at: https://github.com/apache/helix/pull/282 > Task driver should support add/get task framework user content > -- > > Key: HELIX-775 > URL: https://issues.apache.org/jira/browse/HELIX-775 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Task driver should support add/get task framework user content at > workflow/job/task levels > > AC: > * finish implementation > * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
helix - Build # 1558 - Still Failing
The Apache Jenkins build system has built helix (build #1558) Status: Still Failing Check console output at https://builds.apache.org/job/helix/1558/ to view the results.
[jira] [Created] (HELIX-775) Task driver should support add/get task framework user content
Harry Zhang created HELIX-775: - Summary: Task driver should support add/get task framework user content Key: HELIX-775 URL: https://issues.apache.org/jira/browse/HELIX-775 Project: Apache Helix Issue Type: Task Reporter: Harry Zhang Assignee: Harry Zhang Task driver should support add/get task framework user content at workflow/job/task levels AC: * finish implementation * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
helix - Build # 1561 - Still Failing
The Apache Jenkins build system has built helix (build #1561) Status: Still Failing Check console output at https://builds.apache.org/job/helix/1561/ to view the results.
[GitHub] helix pull request #284: PR
GitHub user narendly opened a pull request: https://github.com/apache/helix/pull/284 PR You can merge this pull request into a Git repository by running: $ git pull https://github.com/narendly/helix master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/284.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #284 commit 6090732be6b88863017a93106fa692dc7350520b Author: Hunter Lee Date: 2018-10-31T21:20:18Z [HELIX-776] REST2.0: Add delete command to updateInstanceConfig For instance configs, REST2.0 did not expose the REST API for deletion of fields. This RB adds update and delete commands to updateInstanceConfig and an integration test thereof. Changelist: 1. Add delete command to updateInstanceConfig in InstanceAccessor 2. Add integration tests commit 5d24ed544898ff69f289f54be71a04413735d118 Author: Hunter Lee Date: 2018-10-31T21:21:49Z [HELIX-777] TASK: Handle null currentState for unscheduled tasks It was observed that when a workflow is submitted and the Controller attempts to schedule its tasks, ZK read fails to read the appropriate job's context, causing the job to be stuck in an unscheduled state. The job remained unscheduled because it had no currentStates, and its job context did not contain any assignment/state information. This RB fixes such stuck states by detecting null currentStates. Changelist: 1. Check if currentState is null and if it is, manually assign an INIT state commit ceba1a55ae351090144c001324f908f2364212a4 Author: Hunter Lee Date: 2018-11-01T00:20:37Z [HELIX-778] TASK: Fix a race condition in updatePreviousAssignedTasksStatus It was observed that TestUnregisteredCommand is very unstable. The reason was identified to be a race condition where when a task fails, sometimes a pending message for that task (from INIT to RUNNING) wasn't being cleaned up on time, so AbstractTaskDispatcher's updatePreviousAssignedTasksStatus would try to process that message and skip the status update of that task (like updating its status and NUM_ATTEMPTS field in JobContext). A short, temporary fix is to call markPartitionError() prior to checking the pending message, but over the long haul, we would need to revisit the task status update's design here to avoid this type of race conditions. Changelist: 1. Move markPartitionError() up before checking for a pending message on the task 2. Fix TestUnregisteredCommand's instability ---
[jira] [Created] (HELIX-778) TASK: Fix a race condition in updatePreviousAssignedTasksStatus
Hunter L created HELIX-778: -- Summary: TASK: Fix a race condition in updatePreviousAssignedTasksStatus Key: HELIX-778 URL: https://issues.apache.org/jira/browse/HELIX-778 Project: Apache Helix Issue Type: Improvement Reporter: Hunter L Assignee: Hunter L It was observed that TestUnregisteredCommand is very unstable. The reason was identified to be a race condition where when a task fails, sometimes a pending message for that task (from INIT to RUNNING) wasn't being cleaned up on time, so AbstractTaskDispatcher's updatePreviousAssignedTasksStatus would try to process that message and skip the status update of that task (like updating its status and NUM_ATTEMPTS field in JobContext). A short, temporary fix is to call markPartitionError() prior to checking the pending message, but over the long haul, we would need to revisit the task status update's design here to avoid this type of race conditions. Changelist: 1. Move markPartitionError() up before checking for a pending message on the task 2. Fix TestUnregisteredCommand's instability -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] helix pull request #284: PR
Github user asfgit closed the pull request at: https://github.com/apache/helix/pull/284 ---
[jira] [Commented] (HELIX-778) TASK: Fix a race condition in updatePreviousAssignedTasksStatus
[ https://issues.apache.org/jira/browse/HELIX-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670955#comment-16670955 ] Hudson commented on HELIX-778: -- FAILURE: Integrated in Jenkins build helix #1561 (See [https://builds.apache.org/job/helix/1561/]) [HELIX-778] TASK: Fix a race condition in (hulee: rev ceba1a55ae351090144c001324f908f2364212a4) * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestUnregisteredCommand.java * (edit) helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java > TASK: Fix a race condition in updatePreviousAssignedTasksStatus > --- > > Key: HELIX-778 > URL: https://issues.apache.org/jira/browse/HELIX-778 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > It was observed that TestUnregisteredCommand is very unstable. The reason was > identified to be a race condition where when a task fails, sometimes a > pending message for that task (from INIT to RUNNING) wasn't being cleaned up > on time, so AbstractTaskDispatcher's updatePreviousAssignedTasksStatus would > try to process that message and skip the status update of that task (like > updating its status and NUM_ATTEMPTS field in JobContext). > A short, temporary fix is to call markPartitionError() prior to checking the > pending message, but over the long haul, we would need to revisit the task > status update's design here to avoid this type of race conditions. > Changelist: > 1. Move markPartitionError() up before checking for a pending message on the > task > 2. Fix TestUnregisteredCommand's instability -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-777) TASK: Handle null currentState for unscheduled tasks
[ https://issues.apache.org/jira/browse/HELIX-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670954#comment-16670954 ] Hudson commented on HELIX-777: -- FAILURE: Integrated in Jenkins build helix #1561 (See [https://builds.apache.org/job/helix/1561/]) [HELIX-777] TASK: Handle null currentState for unscheduled tasks (hulee: rev 5d24ed544898ff69f289f54be71a04413735d118) * (edit) helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java > TASK: Handle null currentState for unscheduled tasks > > > Key: HELIX-777 > URL: https://issues.apache.org/jira/browse/HELIX-777 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > It was observed that when a workflow is submitted and the Controller attempts > to schedule its tasks, ZK read fails to read the appropriate job's context, > causing the job to be stuck in an unscheduled state. The job remained > unscheduled because it had no currentStates, and its job context did not > contain any assignment/state information. This RB fixes such stuck states by > detecting null currentStates. > Changelist: > 1. Check if currentState is null and if it is, manually assign an INIT state -- This message was sent by Atlassian JIRA (v7.6.3#76005)