[jira] [Created] (HELIX-774) Helix process getting increased day by day
Mohanraj Tirougnaname created HELIX-774: --- Summary: Helix process getting increased day by day Key: HELIX-774 URL: https://issues.apache.org/jira/browse/HELIX-774 Project: Apache Helix Issue Type: Bug Components: helix-webapp-admin Affects Versions: 0.6.5 Environment: Linux Reporter: Mohanraj Tirougnaname Fix For: 0.6.5 Hi Team, We are using helix in cluster Load balancing. While starting jbpm server the 3 helix process daily getting added in the process and takes lot of memory. Below I have added the helix process, please help me to fix this. Please refer below process jboss 31027 1 0 Oct19 ? 00:04:27 /u01/app/mw/jdk1.8.0_121/bin/java -Xms512m -Xmx512m -classpath /u01/app/mw/prod_zk/helix/conf /u01/app/mw/prod_zk/heli /repo/log4j/log4j/1.2.15/log4j-1.2.15.jar /u01/app/mw/prod_zk/helix/repo/org/apache/zookeeper/zookeeper/3.3.4/zookeeper-3.3.4.jar /u01/app/mw/prod_zk/helix/repo/jline/jline/0.9.94/jline-0.9.94.jar /u01/app/mw/prod_zk/helix/repo/org/codehaus/jackson/jackson-core-asl/1.8.5/jackson-core-asl-1.8.5.jar /u01/app/mw/prod_zk/helix/repo/org/codehaus/jackson/jackson-mapper-asl/1.8.5/jackson-mapper-asl-1.8.5.jar /u01/app/mw/prod_zk/helix/repo/commons-io/commons-io/1.4/commons-io-1.4.jar /u01/app/mw/prod_zk/helix/repo/commons-cli/commons-cli/1.2/commons-cli-1.2.jar /u01/app/mw/prod_zk/helix/repo/com/github/sgroschupf/zkclient/0.1/zkclient-0.1.jar /u01/app/mw/prod_zk/helix/repo/org/apache/commons/commons-math/2.1/commons-math-2.1.jar /u01/app/mw/prod_zk/helix/repo/commons-codec/commons-codec/1.6/commons-codec-1.6.jar /u01/app/mw/prod_zk/helix/repo/com/google/guava/guava/15.0/guava-15.0.jar /u01/app/mw/prod_zk/helix/repo/org/yaml/snakeyaml/1.12/snakeyaml-1.12.jar /u01/app/mw/prod_zk/helix/repo/org/apache/helix/helix-core/0.6.5/helix-core-0.6.5.jar -Dapp.name=run-helix-controller -Dapp.pid=31027 -Dapp.repo=/u01/app/mw/prod_zk/helix/repo -Dbasedir=/u01/app/mw/prod_zk/helix org.apache.helix.controller.HelixControllerMain --zkSvr 204.26.160.42:4181,204.26.160.43:4181 --cluster repoCluster3 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
helix git commit: [HELIX-772] add TaskDriver.addUserContent() api and related tests
Repository: helix Updated Branches: refs/heads/master 0beeb8fa2 -> 0c251bbf6 [HELIX-772] add TaskDriver.addUserContent() api and related tests Project: http://git-wip-us.apache.org/repos/asf/helix/repo Commit: http://git-wip-us.apache.org/repos/asf/helix/commit/0c251bbf Tree: http://git-wip-us.apache.org/repos/asf/helix/tree/0c251bbf Diff: http://git-wip-us.apache.org/repos/asf/helix/diff/0c251bbf Branch: refs/heads/master Commit: 0c251bbf640206729755301c3dda734eea78343f Parents: 0beeb8f Author: Harry Zhang Authored: Tue Oct 30 16:25:12 2018 -0700 Committer: Harry Zhang Committed: Wed Oct 31 13:47:52 2018 -0700 -- .../java/org/apache/helix/task/TaskDriver.java | 46 - .../java/org/apache/helix/task/TaskUtil.java| 52 +++-- .../task/TestIndependentTaskRebalancer.java | 43 ++-- .../helix/task/TestGetSetUserContentStore.java | 206 +++ .../helix/task/TestGetUserContentStore.java | 144 - 5 files changed, 309 insertions(+), 182 deletions(-) -- http://git-wip-us.apache.org/repos/asf/helix/blob/0c251bbf/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java -- diff --git a/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java b/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java index ea529e8..e6256ed 100644 --- a/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java +++ b/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java @@ -19,8 +19,26 @@ package org.apache.helix.task; * under the License. */ +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; import org.I0Itec.zkclient.DataUpdater; -import org.apache.helix.*; +import org.apache.helix.AccessOption; +import org.apache.helix.BaseDataAccessor; +import org.apache.helix.ConfigAccessor; +import org.apache.helix.HelixAdmin; +import org.apache.helix.HelixDataAccessor; +import org.apache.helix.HelixException; +import org.apache.helix.HelixManager; +import org.apache.helix.PropertyKey; +import org.apache.helix.PropertyPathBuilder; +import org.apache.helix.SystemPropertyKeys; +import org.apache.helix.ZNRecord; import org.apache.helix.controller.rebalancer.util.RebalanceScheduler; import org.apache.helix.manager.zk.ZKHelixAdmin; import org.apache.helix.manager.zk.ZKHelixDataAccessor; @@ -35,8 +53,6 @@ import org.apache.helix.util.HelixUtil; import org.slf4j.Logger; import org.slf4j.LoggerFactory; -import java.util.*; - /** * CLI for scheduling/canceling workflows */ @@ -1003,6 +1019,30 @@ public class TaskDriver { } /** + * Set user content defined by the given key and string + * @param key content key + * @param value content value + * @param workflowName name of the workflow - must provide when scope is WORKFLOW + * @param jobName name of the job - must provide when scope is JOB or TASK + * @param taskName name of the task - must provide when scope is TASK + * @param scope scope of the content + */ + public void addUserContent(String key, String value, String workflowName, String jobName, String taskName, + UserContentStore.Scope scope) { +switch (scope) { +case WORKFLOW: + TaskUtil.addWorkflowJobUserContent(_propertyStore, workflowName, key, value); + break; +case JOB: + TaskUtil.addWorkflowJobUserContent(_propertyStore, jobName, key, value); + break; +default: + TaskUtil.addTaskUserContent(_propertyStore, jobName, taskName, key, value); + break; +} + } + + /** * Throw Exception if children nodes will exceed limitation after adding newNodesCount children. * @param newConfigNodeCount */ http://git-wip-us.apache.org/repos/asf/helix/blob/0c251bbf/helix-core/src/main/java/org/apache/helix/task/TaskUtil.java -- diff --git a/helix-core/src/main/java/org/apache/helix/task/TaskUtil.java b/helix-core/src/main/java/org/apache/helix/task/TaskUtil.java index 1ce448c..3461233 100644 --- a/helix-core/src/main/java/org/apache/helix/task/TaskUtil.java +++ b/helix-core/src/main/java/org/apache/helix/task/TaskUtil.java @@ -19,7 +19,6 @@ package org.apache.helix.task; * under the License. */ -import com.google.common.collect.Sets; import java.io.IOException; import java.util.Collections; import java.util.HashMap; @@ -41,12 +40,13 @@ import org.apache.helix.model.HelixConfigScope; import org.apache.helix.model.ResourceConfig; import org.apache.helix.model.builder.HelixConfigScopeBuilder; import org.apache.helix.store.HelixPropertyStore; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; import org.codehaus.jackson.ma
[jira] [Commented] (HELIX-772) Support TaskDriver.addUserContent() api
[ https://issues.apache.org/jira/browse/HELIX-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670698#comment-16670698 ] ASF GitHub Bot commented on HELIX-772: -- Github user asfgit closed the pull request at: https://github.com/apache/helix/pull/280 > Support TaskDriver.addUserContent() api > --- > > Key: HELIX-772 > URL: https://issues.apache.org/jira/browse/HELIX-772 > Project: Apache Helix > Issue Type: Bug >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Need to support add user content in task driver > > AC: > * implement APi > * add test > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
helix git commit: [HELIX-773] add getLastScheduledTaskTimestamp information in workflow rest api
Repository: helix Updated Branches: refs/heads/master 0c251bbf6 -> 566d4f166 [HELIX-773] add getLastScheduledTaskTimestamp information in workflow rest api Project: http://git-wip-us.apache.org/repos/asf/helix/repo Commit: http://git-wip-us.apache.org/repos/asf/helix/commit/566d4f16 Tree: http://git-wip-us.apache.org/repos/asf/helix/tree/566d4f16 Diff: http://git-wip-us.apache.org/repos/asf/helix/diff/566d4f16 Branch: refs/heads/master Commit: 566d4f166473b477ea0db1cfba5d04c8f3d6bf30 Parents: 0c251bb Author: Harry Zhang Authored: Tue Oct 30 16:43:25 2018 -0700 Committer: Harry Zhang Committed: Wed Oct 31 13:48:46 2018 -0700 -- .../java/org/apache/helix/task/TaskDriver.java | 22 +++- .../apache/helix/task/TaskExecutionInfo.java| 65 ++ .../task/TestGetLastScheduledTaskExecInfo.java | 122 +++ .../task/TestGetLastScheduledTaskTimestamp.java | 110 - .../resources/helix/WorkflowAccessor.java | 5 +- .../helix/rest/server/TestWorkflowAccessor.java | 9 +- 6 files changed, 213 insertions(+), 120 deletions(-) -- http://git-wip-us.apache.org/repos/asf/helix/blob/566d4f16/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java -- diff --git a/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java b/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java index e6256ed..54e3ab3 100644 --- a/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java +++ b/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java @@ -85,7 +85,6 @@ public class TaskDriver { // TODO Implement or configure the limitation in ZK server. private final static long DEFAULT_CONFIGS_LIMITATION = HelixUtil.getSystemPropertyAsLong(SystemPropertyKeys.TASK_CONFIG_LIMITATION, 10L); - private final static long TIMESTAMP_NOT_SET = -1L; private final static String TASK_START_TIME_KEY = "START_TIME"; protected long _configsLimitation = DEFAULT_CONFIGS_LIMITATION; @@ -977,14 +976,22 @@ public class TaskDriver { * -1L if timestamp is not set (either nothing is scheduled or no start time recorded). */ public long getLastScheduledTaskTimestamp(String workflowName) { -long lastScheduledTaskTimestamp = TIMESTAMP_NOT_SET; +return getLastScheduledTaskExecutionInfo(workflowName).getStartTimeStamp(); + } + + public TaskExecutionInfo getLastScheduledTaskExecutionInfo(String workflowName) { +long lastScheduledTaskTimestamp = TaskExecutionInfo.TIMESTAMP_NOT_SET; +String jobName = null; +Integer taskPartitionIndex = null; +TaskPartitionState state = null; + WorkflowContext workflowContext = getWorkflowContext(workflowName); if (workflowContext != null) { Map allJobStates = workflowContext.getJobStates(); - for (String job : allJobStates.keySet()) { -if (!allJobStates.get(job).equals(TaskState.NOT_STARTED)) { - JobContext jobContext = getJobContext(job); + for (Map.Entry jobState : allJobStates.entrySet()) { +if (!jobState.getValue().equals(TaskState.NOT_STARTED)) { + JobContext jobContext = getJobContext(jobState.getKey()); if (jobContext != null) { Set allPartitions = jobContext.getPartitionSet(); for (Integer partition : allPartitions) { @@ -993,6 +1000,9 @@ public class TaskDriver { long startTimeLong = Long.parseLong(startTime); if (startTimeLong > lastScheduledTaskTimestamp) { lastScheduledTaskTimestamp = startTimeLong; + jobName = jobState.getKey(); + taskPartitionIndex = partition; + state = jobContext.getPartitionState(partition); } } } @@ -1000,7 +1010,7 @@ public class TaskDriver { } } } -return lastScheduledTaskTimestamp; +return new TaskExecutionInfo(jobName, taskPartitionIndex, state, lastScheduledTaskTimestamp); } /** http://git-wip-us.apache.org/repos/asf/helix/blob/566d4f16/helix-core/src/main/java/org/apache/helix/task/TaskExecutionInfo.java -- diff --git a/helix-core/src/main/java/org/apache/helix/task/TaskExecutionInfo.java b/helix-core/src/main/java/org/apache/helix/task/TaskExecutionInfo.java new file mode 100644 index 000..03d66b4 --- /dev/null +++ b/helix-core/src/main/java/org/apache/helix/task/TaskExecutionInfo.java @@ -0,0 +1,65 @@ +package org.apache.helix.task; + +import java.io.IOException; +import org.codehaus.jackson.annotate.JsonCreator; +import org.codehaus.jackson.annotate.JsonIgnoreProperties; +import org.codehaus.jackson.annotate.JsonProperty; +import org.codehaus.jackson.map.ObjectMapper; + +@JsonIgnorePrope
[jira] [Commented] (HELIX-773) Support getLastScheduledTaskTimestamp information in workflow rest api
[ https://issues.apache.org/jira/browse/HELIX-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670702#comment-16670702 ] ASF GitHub Bot commented on HELIX-773: -- Github user asfgit closed the pull request at: https://github.com/apache/helix/pull/281 > Support getLastScheduledTaskTimestamp information in workflow rest api > -- > > Key: HELIX-773 > URL: https://issues.apache.org/jira/browse/HELIX-773 > Project: Apache Helix > Issue Type: Bug >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Support getLastScheduledTaskTimestamp information in workflow rest api -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[1/2] helix git commit: Customize the pipeline thread name with cluster name and pipeline type.
Repository: helix Updated Branches: refs/heads/master 566d4f166 -> 1103fecb6 Customize the pipeline thread name with cluster name and pipeline type. Project: http://git-wip-us.apache.org/repos/asf/helix/repo Commit: http://git-wip-us.apache.org/repos/asf/helix/commit/96593708 Tree: http://git-wip-us.apache.org/repos/asf/helix/tree/96593708 Diff: http://git-wip-us.apache.org/repos/asf/helix/diff/96593708 Branch: refs/heads/master Commit: 9659370849116b8a3f5d1853c7695274942ce211 Parents: 566d4f1 Author: Lei Xia Authored: Tue Oct 2 14:43:25 2018 -0700 Committer: Junkai Xue Committed: Wed Oct 31 13:50:33 2018 -0700 -- .../controller/GenericHelixController.java | 30 +--- 1 file changed, 19 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/helix/blob/96593708/helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java -- diff --git a/helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java b/helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java index dd409e5..eb75286 100644 --- a/helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java +++ b/helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java @@ -356,7 +356,7 @@ public class GenericHelixController implements IdealStateChangeListener, } private GenericHelixController(PipelineRegistry registry, PipelineRegistry taskRegistry, - String clusterName) { + final String clusterName) { _paused = false; _registry = registry; _taskRegistry = taskRegistry; @@ -366,7 +366,7 @@ public class GenericHelixController implements IdealStateChangeListener, _asyncTasksThreadPool = Executors.newScheduledThreadPool(ASYNC_TASKS_THREADPOOL_SIZE, new ThreadFactory() { @Override public Thread newThread(Runnable r) { -return new Thread(r, "GerenricHelixController-async_task_thread"); +return new Thread(r, "HelixController-async_tasks-" + _clusterName); } }); @@ -378,8 +378,9 @@ public class GenericHelixController implements IdealStateChangeListener, _cache = new ClusterDataCache(clusterName); _taskCache = new ClusterDataCache(clusterName); -_eventThread = new ClusterEventProcessor(_cache, _eventQueue); -_taskEventThread = new ClusterEventProcessor(_taskCache, _taskEventQueue); +_eventThread = new ClusterEventProcessor(_cache, _eventQueue, "default-" + clusterName); +_taskEventThread = +new ClusterEventProcessor(_taskCache, _taskEventQueue, "task-" + clusterName); _forceRebalanceTimer = new Timer(); _lastPipelineEndTimestamp = TopStateHandoffReportStage.TIMESTAMP_NOT_RECORDED; @@ -979,33 +980,40 @@ public class GenericHelixController implements IdealStateChangeListener, private class ClusterEventProcessor extends Thread { private final ClusterDataCache _cache; private final ClusterEventBlockingQueue _eventBlockingQueue; +private final String _processorName; public ClusterEventProcessor(ClusterDataCache cache, -ClusterEventBlockingQueue eventBlockingQueue) { - super("GenericHelixController-event_process"); +ClusterEventBlockingQueue eventBlockingQueue, String processorName) { + super("HelixController-pipeline-" + processorName); _cache = cache; _eventBlockingQueue = eventBlockingQueue; + _processorName = processorName; } @Override public void run() { - logger.info("START ClusterEventProcessor thread for cluster " + _clusterName); + logger.info( + "START ClusterEventProcessor thread for cluster " + _clusterName + ", processor name: " + + _processorName); while (!isInterrupted()) { try { handleEvent(_eventBlockingQueue.take(), _cache); } catch (InterruptedException e) { - logger.warn("ClusterEventProcessor interrupted", e); + logger.warn("ClusterEventProcessor interrupted " + _processorName, e); interrupt(); } catch (ZkInterruptedException e) { - logger.warn("ClusterEventProcessor caught a ZK connection interrupt", e); + logger + .warn("ClusterEventProcessor caught a ZK connection interrupt " + _processorName, e); interrupt(); } catch (ThreadDeath death) { + logger.error("ClusterEventProcessor caught a ThreadDeath " + _processorName, death); throw death; } catch (Throwable t) { - logger.error("ClusterEventProcessor failed while running the controller pipeline", t); + logger.error("ClusterEventProcessor failed while running the controller pipeline " +
[2/2] helix git commit: Improve message handling properly
Improve message handling properly We need some improvement for client side to help stabilize the message handling. Current scenario is that when process message, Helix marked the message read before really scheduled. In this case, if scheduling has problem, but the message already has been marked as read, no one will take care of the message and hang there forever. Project: http://git-wip-us.apache.org/repos/asf/helix/repo Commit: http://git-wip-us.apache.org/repos/asf/helix/commit/1103fecb Tree: http://git-wip-us.apache.org/repos/asf/helix/tree/1103fecb Diff: http://git-wip-us.apache.org/repos/asf/helix/diff/1103fecb Branch: refs/heads/master Commit: 1103fecb67def5e610a7b22636ba4ac25e23777b Parents: 9659370 Author: Junkai Xue Authored: Mon Sep 17 11:51:31 2018 -0700 Committer: Junkai Xue Committed: Wed Oct 31 13:50:37 2018 -0700 -- .../helix/messaging/handling/HelixTaskExecutor.java | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/helix/blob/1103fecb/helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java -- diff --git a/helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java b/helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java index 5e2082c..3ae90d3 100644 --- a/helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java +++ b/helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java @@ -952,22 +952,25 @@ public class HelixTaskExecutor implements MessageListener, TaskExecutor { if (readMsgs.size() > 0) { updateMessageState(readMsgs, accessor, instanceName); + // Remove message if schedule tasks are failed. for (Map.Entry handlerEntry : stateTransitionHandlers.entrySet()) { MessageHandler handler = handlerEntry.getValue(); NotificationContext context = stateTransitionContexts.get(handlerEntry.getKey()); Message msg = handler._message; -scheduleTask( -new HelixTask(msg, context, handler, this) -); +if (!scheduleTask(new HelixTask(msg, context, handler, this))) { + removeMessageFromTaskAndFutureMap(msg); + removeMessageFromZK(accessor, msg, instanceName); +} } for (int i = 0; i < nonStateTransitionHandlers.size(); i++) { MessageHandler handler = nonStateTransitionHandlers.get(i); NotificationContext context = nonStateTransitionContexts.get(i); Message msg = handler._message; -scheduleTask( -new HelixTask(msg, context, handler, this) -); +if (!scheduleTask(new HelixTask(msg, context, handler, this))) { + removeMessageFromTaskAndFutureMap(msg); + removeMessageFromZK(accessor, msg, instanceName); +} } } }
[jira] [Created] (HELIX-775) Task driver should support add/get task framework user content
Harry Zhang created HELIX-775: - Summary: Task driver should support add/get task framework user content Key: HELIX-775 URL: https://issues.apache.org/jira/browse/HELIX-775 Project: Apache Helix Issue Type: Task Reporter: Harry Zhang Assignee: Harry Zhang Task driver should support add/get task framework user content at workflow/job/task levels AC: * finish implementation * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-775) Task driver should support add/get task framework user content
[ https://issues.apache.org/jira/browse/HELIX-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670711#comment-16670711 ] ASF GitHub Bot commented on HELIX-775: -- GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/282 [HELIX-775] add task driver support for helix rest to add/get task fr… …amework user content consolidate user content related apis for task driver To consolidate task driver user content related apis, and corresponding rest apis, I'm deprecating the general getUserContent() api, but instead, we now have the following apis for get / add / update user content. ```java public void addOrUpdateWorkflowUserContentMap(String workflowName, final Map contentToAddOrUpdate); public void addOrUpdateJobUserContentMap(String workflowName, String jobName, final Map contentToAddOrUpdate); public void addOrUpdateTaskUserContentMap(String workflowName, String jobName, String taskPartitionId, final Map contentToAddOrUpdate); public Map getWorkflowUserContentMap(String workflowName); public Map getJobUserContentMap(String workflowName, String jobName); public Map getTaskUserContentMap(String workflowName, String jobName, String taskPartitionId); ``` API for deleting user content is TBD but can use the same convension You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/task-user-content Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/282.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #282 commit 7ec5313bccb679014d6a0605ee5d7184063e555e Author: Harry Zhang Date: 2018-10-31T20:55:44Z [HELIX-775] add task driver support for helix rest to add/get task framework user content > Task driver should support add/get task framework user content > -- > > Key: HELIX-775 > URL: https://issues.apache.org/jira/browse/HELIX-775 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Task driver should support add/get task framework user content at > workflow/job/task levels > > AC: > * finish implementation > * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
helix git commit: [HELIX-775] add task driver support for helix rest to add/get task framework user content
Repository: helix Updated Branches: refs/heads/master 1103fecb6 -> 7ec5313bc [HELIX-775] add task driver support for helix rest to add/get task framework user content Project: http://git-wip-us.apache.org/repos/asf/helix/repo Commit: http://git-wip-us.apache.org/repos/asf/helix/commit/7ec5313b Tree: http://git-wip-us.apache.org/repos/asf/helix/tree/7ec5313b Diff: http://git-wip-us.apache.org/repos/asf/helix/diff/7ec5313b Branch: refs/heads/master Commit: 7ec5313bccb679014d6a0605ee5d7184063e555e Parents: 1103fec Author: Harry Zhang Authored: Wed Oct 31 13:55:44 2018 -0700 Committer: Harry Zhang Committed: Wed Oct 31 13:55:44 2018 -0700 -- .../java/org/apache/helix/task/TaskDriver.java | 35 ++ .../java/org/apache/helix/task/TaskUtil.java| 50 +--- 2 files changed, 78 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/helix/blob/7ec5313b/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java -- diff --git a/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java b/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java index 54e3ab3..25a4fe4 100644 --- a/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java +++ b/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java @@ -1029,6 +1029,41 @@ public class TaskDriver { } /** + * Return the full user content map for workflow + * @param workflowName workflow name + * @return user content map + */ + public Map getWorkflowUserContentMap(String workflowName) { +return TaskUtil.getWorkflowJobUserContentMap(_propertyStore, workflowName); + } + + /** + * Return full user content map for job + * @param workflowName workflow name + * @param jobName Un-namespaced job name + * @return user content map + */ + public Map getJobUserContentMap(String workflowName, String jobName) { +String namespacedJobName = TaskUtil.getNamespacedJobName(workflowName, jobName); +return TaskUtil.getWorkflowJobUserContentMap(_propertyStore, namespacedJobName); + } + + /** + * Return full user content map for task + * @param workflowName workflow name + * @param jobName Un-namespaced job name + * @param taskPartitionId task partition id + * @return user content map + */ + public Map getTaskContentMap(String workflowName, String jobName, String taskPartitionId) { +String namespacedJobName = TaskUtil.getNamespacedJobName(workflowName, jobName); +String namespacedTaskName = TaskUtil.getNamespacedTaskName(namespacedJobName, taskPartitionId); +return TaskUtil.getTaskUserContentMap(_propertyStore, namespacedJobName, namespacedTaskName); + } + + + + /** * Set user content defined by the given key and string * @param key content key * @param value content value http://git-wip-us.apache.org/repos/asf/helix/blob/7ec5313b/helix-core/src/main/java/org/apache/helix/task/TaskUtil.java -- diff --git a/helix-core/src/main/java/org/apache/helix/task/TaskUtil.java b/helix-core/src/main/java/org/apache/helix/task/TaskUtil.java index 3461233..5581b6f 100644 --- a/helix-core/src/main/java/org/apache/helix/task/TaskUtil.java +++ b/helix-core/src/main/java/org/apache/helix/task/TaskUtil.java @@ -26,7 +26,6 @@ import java.util.HashSet; import java.util.List; import java.util.Map; import java.util.Set; - import org.I0Itec.zkclient.DataUpdater; import org.apache.helix.AccessOption; import org.apache.helix.HelixDataAccessor; @@ -320,9 +319,22 @@ public class TaskUtil { */ protected static String getWorkflowJobUserContent(HelixPropertyStore propertyStore, String workflowJobResource, String key) { -ZNRecord r = propertyStore.get(Joiner.on("/").join(TaskConstants.REBALANCER_CONTEXT_ROOT, -workflowJobResource, USER_CONTENT_NODE), null, AccessOption.PERSISTENT); -return r != null ? r.getSimpleField(key) : null; +Map userContentMap = getWorkflowJobUserContentMap(propertyStore, workflowJobResource); +return userContentMap != null ? userContentMap.get(key) : null; + } + + /** + * get workflow/job user content map + * @param propertyStore property store + * @param workflowJobResource workflow name or namespaced job name + * @return user content map + */ + protected static Map getWorkflowJobUserContentMap( + HelixPropertyStore propertyStore, String workflowJobResource) { +ZNRecord record = propertyStore.get(Joiner.on("/") +.join(TaskConstants.REBALANCER_CONTEXT_ROOT, workflowJobResource, USER_CONTENT_NODE), null, +AccessOption.PERSISTENT); +return record != null ? record.getSimpleFields() : null; } /** @@ -365,10 +377,24 @@ public class TaskUtil { */ protected s
[jira] [Commented] (HELIX-775) Task driver should support add/get task framework user content
[ https://issues.apache.org/jira/browse/HELIX-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670714#comment-16670714 ] ASF GitHub Bot commented on HELIX-775: -- Github user asfgit closed the pull request at: https://github.com/apache/helix/pull/282 > Task driver should support add/get task framework user content > -- > > Key: HELIX-775 > URL: https://issues.apache.org/jira/browse/HELIX-775 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Task driver should support add/get task framework user content at > workflow/job/task levels > > AC: > * finish implementation > * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-773) Support getLastScheduledTaskTimestamp information in workflow rest api
[ https://issues.apache.org/jira/browse/HELIX-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670718#comment-16670718 ] Hudson commented on HELIX-773: -- FAILURE: Integrated in Jenkins build helix #1558 (See [https://builds.apache.org/job/helix/1558/]) [HELIX-773] add getLastScheduledTaskTimestamp information in workflow (hrzhang: rev 566d4f166473b477ea0db1cfba5d04c8f3d6bf30) * (add) helix-core/src/test/java/org/apache/helix/task/TestGetLastScheduledTaskExecInfo.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/WorkflowAccessor.java * (delete) helix-core/src/test/java/org/apache/helix/task/TestGetLastScheduledTaskTimestamp.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskDriver.java * (add) helix-core/src/main/java/org/apache/helix/task/TaskExecutionInfo.java * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/TestWorkflowAccessor.java > Support getLastScheduledTaskTimestamp information in workflow rest api > -- > > Key: HELIX-773 > URL: https://issues.apache.org/jira/browse/HELIX-773 > Project: Apache Helix > Issue Type: Bug >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Support getLastScheduledTaskTimestamp information in workflow rest api -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-772) Support TaskDriver.addUserContent() api
[ https://issues.apache.org/jira/browse/HELIX-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670717#comment-16670717 ] Hudson commented on HELIX-772: -- FAILURE: Integrated in Jenkins build helix #1558 (See [https://builds.apache.org/job/helix/1558/]) [HELIX-772] add TaskDriver.addUserContent() api and related tests (hrzhang: rev 0c251bbf640206729755301c3dda734eea78343f) * (add) helix-core/src/test/java/org/apache/helix/task/TestGetSetUserContentStore.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestIndependentTaskRebalancer.java * (delete) helix-core/src/test/java/org/apache/helix/task/TestGetUserContentStore.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskDriver.java > Support TaskDriver.addUserContent() api > --- > > Key: HELIX-772 > URL: https://issues.apache.org/jira/browse/HELIX-772 > Project: Apache Helix > Issue Type: Bug >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Need to support add user content in task driver > > AC: > * implement APi > * add test > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-775) Task driver should support add/get task framework user content
[ https://issues.apache.org/jira/browse/HELIX-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670726#comment-16670726 ] Hudson commented on HELIX-775: -- FAILURE: Integrated in Jenkins build helix #1559 (See [https://builds.apache.org/job/helix/1559/]) [HELIX-775] add task driver support for helix rest to add/get task (hrzhang: rev 7ec5313bccb679014d6a0605ee5d7184063e555e) * (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskDriver.java > Task driver should support add/get task framework user content > -- > > Key: HELIX-775 > URL: https://issues.apache.org/jira/browse/HELIX-775 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Task driver should support add/get task framework user content at > workflow/job/task levels > > AC: > * finish implementation > * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-775) Task driver should support add/get task framework user content
[ https://issues.apache.org/jira/browse/HELIX-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670730#comment-16670730 ] ASF GitHub Bot commented on HELIX-775: -- GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/283 [HELIX-775] consolidate user content related apis for task driver HELIX-1315: consolidate user content related apis for task driver To consolidate task driver user content related apis, and corresponding rest apis, I'm deprecating the general getUserContent() api, but instead, we now have the following apis for get / add / update user content. ```java public void addOrUpdateWorkflowUserContentMap(String workflowName, final Map contentToAddOrUpdate); public void addOrUpdateJobUserContentMap(String workflowName, String jobName, final Map contentToAddOrUpdate); public void addOrUpdateTaskUserContentMap(String workflowName, String jobName, String taskPartitionId, final Map contentToAddOrUpdate); public Map getWorkflowUserContentMap(String workflowName); public Map getJobUserContentMap(String workflowName, String jobName); public Map getTaskUserContentMap(String workflowName, String jobName, String taskPartitionId); ``` delete user content api tbd but can use the same convension You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/task-user-content Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/283.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #283 commit b235c4ee5a82c5970d29e839317ea242813a58bc Author: Harry Zhang Date: 2018-10-04T18:25:08Z [HELIX-775] consolidate user content related apis for task driver > Task driver should support add/get task framework user content > -- > > Key: HELIX-775 > URL: https://issues.apache.org/jira/browse/HELIX-775 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Task driver should support add/get task framework user content at > workflow/job/task levels > > AC: > * finish implementation > * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
helix git commit: [HELIX-775] consolidate user content related apis for task driver
Repository: helix Updated Branches: refs/heads/master 7ec5313bc -> b235c4ee5 [HELIX-775] consolidate user content related apis for task driver Project: http://git-wip-us.apache.org/repos/asf/helix/repo Commit: http://git-wip-us.apache.org/repos/asf/helix/commit/b235c4ee Tree: http://git-wip-us.apache.org/repos/asf/helix/tree/b235c4ee Diff: http://git-wip-us.apache.org/repos/asf/helix/diff/b235c4ee Branch: refs/heads/master Commit: b235c4ee5a82c5970d29e839317ea242813a58bc Parents: 7ec5313 Author: Harry Zhang Authored: Thu Oct 4 11:25:08 2018 -0700 Committer: Harry Zhang Committed: Wed Oct 31 14:03:37 2018 -0700 -- .../java/org/apache/helix/task/TaskDriver.java | 64 +--- .../java/org/apache/helix/task/TaskUtil.java| 18 +++--- .../helix/task/TestGetSetUserContentStore.java | 53 +--- 3 files changed, 83 insertions(+), 52 deletions(-) -- http://git-wip-us.apache.org/repos/asf/helix/blob/b235c4ee/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java -- diff --git a/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java b/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java index 25a4fe4..e675c86 100644 --- a/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java +++ b/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java @@ -1022,7 +1022,12 @@ public class TaskDriver { * @param taskName name of task. Optional if scope is WORKFLOW or JOB * @return null if key-value pair not found or this content store does not exist. Otherwise, * return a String + * + * @deprecated use the following equivalents: {@link #getWorkflowUserContentMap(String)}, + * {@link #getJobUserContentMap(String, String)}, + * @{{@link #getTaskContentMap(String, String, String)}} */ + @Deprecated public String getUserContent(String key, UserContentStore.Scope scope, String workflowName, String jobName, String taskName) { return TaskUtil.getUserContent(_propertyStore, key, scope, workflowName, jobName, taskName); @@ -1055,36 +1060,53 @@ public class TaskDriver { * @param taskPartitionId task partition id * @return user content map */ - public Map getTaskContentMap(String workflowName, String jobName, String taskPartitionId) { + public Map getTaskUserContentMap(String workflowName, String jobName, + String taskPartitionId) { String namespacedJobName = TaskUtil.getNamespacedJobName(workflowName, jobName); String namespacedTaskName = TaskUtil.getNamespacedTaskName(namespacedJobName, taskPartitionId); return TaskUtil.getTaskUserContentMap(_propertyStore, namespacedJobName, namespacedTaskName); } + /** + * Add or update workflow user content with the given map - new keys will be added, and old + * keys will be updated + * @param workflowName workflow name + * @param contentToAddOrUpdate map containing items to add or update + */ + public void addOrUpdateWorkflowUserContentMap(String workflowName, + final Map contentToAddOrUpdate) { +TaskUtil +.addOrUpdateWorkflowJobUserContentMap(_propertyStore, workflowName, contentToAddOrUpdate); + } + /** + * Add or update job user content with the given map - new keys will be added, and old keys will + * be updated + * @param workflowName workflow name + * @param jobName Un-namespaced job name + * @param contentToAddOrUpdate map containing items to add or update + */ + public void addOrUpdateJobUserContentMap(String workflowName, String jobName, + final Map contentToAddOrUpdate) { +String namespacedJobName = TaskUtil.getNamespacedJobName(workflowName, jobName); +TaskUtil.addOrUpdateWorkflowJobUserContentMap(_propertyStore, namespacedJobName, +contentToAddOrUpdate); + } /** - * Set user content defined by the given key and string - * @param key content key - * @param value content value - * @param workflowName name of the workflow - must provide when scope is WORKFLOW - * @param jobName name of the job - must provide when scope is JOB or TASK - * @param taskName name of the task - must provide when scope is TASK - * @param scope scope of the content + * Add or update task user content with the given map - new keys will be added, and old keys + * will be updated + * @param workflowName workflow name + * @param jobName Un-namespaced job name + * @param taskPartitionId task partition id + * @param contentToAddOrUpdate map containing items to add or update */ - public void addUserContent(String key, String value, String workflowName, String jobName, String taskName, - UserContentStore.Scope scope) { -switch (scope) { -case WORKFLOW: - TaskUtil.addWorkflowJobUserContent(_propertyStore, workflowName, key, v
[jira] [Commented] (HELIX-775) Task driver should support add/get task framework user content
[ https://issues.apache.org/jira/browse/HELIX-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670734#comment-16670734 ] ASF GitHub Bot commented on HELIX-775: -- Github user asfgit closed the pull request at: https://github.com/apache/helix/pull/283 > Task driver should support add/get task framework user content > -- > > Key: HELIX-775 > URL: https://issues.apache.org/jira/browse/HELIX-775 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Task driver should support add/get task framework user content at > workflow/job/task levels > > AC: > * finish implementation > * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-775) Task driver should support add/get task framework user content
[ https://issues.apache.org/jira/browse/HELIX-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670739#comment-16670739 ] Hudson commented on HELIX-775: -- FAILURE: Integrated in Jenkins build helix #1560 (See [https://builds.apache.org/job/helix/1560/]) [HELIX-775] consolidate user content related apis for task driver (hrzhang: rev b235c4ee5a82c5970d29e839317ea242813a58bc) * (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskDriver.java * (edit) helix-core/src/test/java/org/apache/helix/task/TestGetSetUserContentStore.java > Task driver should support add/get task framework user content > -- > > Key: HELIX-775 > URL: https://issues.apache.org/jira/browse/HELIX-775 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Task driver should support add/get task framework user content at > workflow/job/task levels > > AC: > * finish implementation > * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HELIX-776) REST2.0: Add delete command to updateInstanceConfig
Hunter L created HELIX-776: -- Summary: REST2.0: Add delete command to updateInstanceConfig Key: HELIX-776 URL: https://issues.apache.org/jira/browse/HELIX-776 Project: Apache Helix Issue Type: Improvement Reporter: Hunter L Assignee: Hunter L For instance configs, REST2.0 did not expose the REST API for deletion of fields. This RB adds update and delete commands to updateInstanceConfig and an integration test thereof. Changelist: 1. Add delete command to updateInstanceConfig in InstanceAccessor 2. Add integration tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HELIX-777) TASK: Handle null currentState for unscheduled tasks
Hunter L created HELIX-777: -- Summary: TASK: Handle null currentState for unscheduled tasks Key: HELIX-777 URL: https://issues.apache.org/jira/browse/HELIX-777 Project: Apache Helix Issue Type: Improvement Reporter: Hunter L Assignee: Hunter L It was observed that when a workflow is submitted and the Controller attempts to schedule its tasks, ZK read fails to read the appropriate job's context, causing the job to be stuck in an unscheduled state. The job remained unscheduled because it had no currentStates, and its job context did not contain any assignment/state information. This RB fixes such stuck states by detecting null currentStates. Changelist: 1. Check if currentState is null and if it is, manually assign an INIT state -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HELIX-778) TASK: Fix a race condition in updatePreviousAssignedTasksStatus
Hunter L created HELIX-778: -- Summary: TASK: Fix a race condition in updatePreviousAssignedTasksStatus Key: HELIX-778 URL: https://issues.apache.org/jira/browse/HELIX-778 Project: Apache Helix Issue Type: Improvement Reporter: Hunter L Assignee: Hunter L It was observed that TestUnregisteredCommand is very unstable. The reason was identified to be a race condition where when a task fails, sometimes a pending message for that task (from INIT to RUNNING) wasn't being cleaned up on time, so AbstractTaskDispatcher's updatePreviousAssignedTasksStatus would try to process that message and skip the status update of that task (like updating its status and NUM_ATTEMPTS field in JobContext). A short, temporary fix is to call markPartitionError() prior to checking the pending message, but over the long haul, we would need to revisit the task status update's design here to avoid this type of race conditions. Changelist: 1. Move markPartitionError() up before checking for a pending message on the task 2. Fix TestUnregisteredCommand's instability -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[1/3] helix git commit: [HELIX-776] REST2.0: Add delete command to updateInstanceConfig
Repository: helix Updated Branches: refs/heads/master b235c4ee5 -> ceba1a55a [HELIX-776] REST2.0: Add delete command to updateInstanceConfig For instance configs, REST2.0 did not expose the REST API for deletion of fields. This RB adds update and delete commands to updateInstanceConfig and an integration test thereof. Changelist: 1. Add delete command to updateInstanceConfig in InstanceAccessor 2. Add integration tests Project: http://git-wip-us.apache.org/repos/asf/helix/repo Commit: http://git-wip-us.apache.org/repos/asf/helix/commit/6090732b Tree: http://git-wip-us.apache.org/repos/asf/helix/tree/6090732b Diff: http://git-wip-us.apache.org/repos/asf/helix/diff/6090732b Branch: refs/heads/master Commit: 6090732be6b88863017a93106fa692dc7350520b Parents: b235c4e Author: Hunter Lee Authored: Wed Oct 31 14:20:18 2018 -0700 Committer: Hunter Lee Committed: Wed Oct 31 14:20:18 2018 -0700 -- .../java/org/apache/helix/ConfigAccessor.java | 6 + .../resources/helix/InstanceAccessor.java | 34 +- .../helix/rest/server/TestInstanceAccessor.java | 114 ++- 3 files changed, 146 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/helix/blob/6090732b/helix-core/src/main/java/org/apache/helix/ConfigAccessor.java -- diff --git a/helix-core/src/main/java/org/apache/helix/ConfigAccessor.java b/helix-core/src/main/java/org/apache/helix/ConfigAccessor.java index 2755113..53f42fb 100644 --- a/helix-core/src/main/java/org/apache/helix/ConfigAccessor.java +++ b/helix-core/src/main/java/org/apache/helix/ConfigAccessor.java @@ -765,6 +765,12 @@ public class ConfigAccessor { .forParticipant(instanceName).build(); String zkPath = scope.getZkPath(); +if (!zkClient.exists(zkPath)) { + throw new HelixException( + "updateInstanceConfig failed. Given InstanceConfig does not already exist. instance: " + + instanceName); +} + if (overwrite) { ZKUtil.createOrReplace(zkClient, zkPath, instanceConfig.getRecord(), true); } else { http://git-wip-us.apache.org/repos/asf/helix/blob/6090732b/helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/InstanceAccessor.java -- diff --git a/helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/InstanceAccessor.java b/helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/InstanceAccessor.java index 98af0ee..38ee3b5 100644 --- a/helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/InstanceAccessor.java +++ b/helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/InstanceAccessor.java @@ -41,10 +41,12 @@ import org.apache.helix.model.ClusterConfig; import org.apache.helix.model.CurrentState; import org.apache.helix.model.Error; import org.apache.helix.model.HealthStat; +import org.apache.helix.model.HelixConfigScope; import org.apache.helix.model.InstanceConfig; import org.apache.helix.model.LiveInstance; import org.apache.helix.model.Message; import org.apache.helix.model.ParticipantHistory; +import org.apache.helix.model.builder.HelixConfigScopeBuilder; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.codehaus.jackson.JsonNode; @@ -321,7 +323,19 @@ public class InstanceAccessor extends AbstractHelixResource { @POST @Path("{instanceName}/configs") public Response updateInstanceConfig(@PathParam("clusterId") String clusterId, - @PathParam("instanceName") String instanceName, String content) { + @PathParam("instanceName") String instanceName, @QueryParam("command") String commandStr, + String content) { +Command command; +if (commandStr == null || commandStr.isEmpty()) { + command = Command.update; // Default behavior to keep it backward-compatible +} else { + try { +command = getCommand(commandStr); + } catch (HelixException ex) { +return badRequest(ex.getMessage()); + } +} + ZNRecord record; try { record = toZNRecord(content); @@ -332,11 +346,25 @@ public class InstanceAccessor extends AbstractHelixResource { InstanceConfig instanceConfig = new InstanceConfig(record); ConfigAccessor configAccessor = getConfigAccessor(); try { - configAccessor.updateInstanceConfig(clusterId, instanceName, instanceConfig); + switch (command) { + case update: +configAccessor.updateInstanceConfig(clusterId, instanceName, instanceConfig); +break; + case delete: { +HelixConfigScope instanceScope = +new HelixConfigScopeBuilder(HelixConfigScope.ConfigScopeProperty.PARTICIPANT) +.forCluster(clusterId).forParticipant(instanceName)
[2/3] helix git commit: [HELIX-777] TASK: Handle null currentState for unscheduled tasks
[HELIX-777] TASK: Handle null currentState for unscheduled tasks It was observed that when a workflow is submitted and the Controller attempts to schedule its tasks, ZK read fails to read the appropriate job's context, causing the job to be stuck in an unscheduled state. The job remained unscheduled because it had no currentStates, and its job context did not contain any assignment/state information. This RB fixes such stuck states by detecting null currentStates. Changelist: 1. Check if currentState is null and if it is, manually assign an INIT state Project: http://git-wip-us.apache.org/repos/asf/helix/repo Commit: http://git-wip-us.apache.org/repos/asf/helix/commit/5d24ed54 Tree: http://git-wip-us.apache.org/repos/asf/helix/tree/5d24ed54 Diff: http://git-wip-us.apache.org/repos/asf/helix/diff/5d24ed54 Branch: refs/heads/master Commit: 5d24ed544898ff69f289f54be71a04413735d118 Parents: 6090732 Author: Hunter Lee Authored: Wed Oct 31 14:21:49 2018 -0700 Committer: Hunter Lee Committed: Wed Oct 31 17:17:16 2018 -0700 -- .../helix/task/AbstractTaskDispatcher.java | 213 +-- 1 file changed, 106 insertions(+), 107 deletions(-) -- http://git-wip-us.apache.org/repos/asf/helix/blob/5d24ed54/helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java -- diff --git a/helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java b/helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java index 617263b..aa72f2d 100644 --- a/helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java +++ b/helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java @@ -23,7 +23,6 @@ import org.apache.helix.model.Partition; import org.apache.helix.model.ResourceAssignment; import org.apache.helix.monitoring.mbeans.ClusterStatusMonitor; import org.apache.helix.task.assigner.AssignableInstance; - import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -73,7 +72,6 @@ public abstract class AbstractTaskDispatcher { // this pending state transition, essentially "waiting" until this pending message clears Message pendingMessage = currStateOutput.getPendingMessage(jobResource, new Partition(pName), instance); - if (pendingMessage != null && !pendingMessage.getToState().equals(currState.name())) { // If there is a pending message whose destination state is different from the current // state, just make the same assignment as the pending message. This is essentially @@ -125,26 +123,26 @@ public abstract class AbstractTaskDispatcher { } switch (currState) { - case RUNNING: { -TaskPartitionState nextState = TaskPartitionState.RUNNING; -if (jobState == TaskState.TIMING_OUT) { - nextState = TaskPartitionState.TASK_ABORTED; -} else if (jobTgtState == TargetState.STOP) { - nextState = TaskPartitionState.STOPPED; -} else if (jobState == TaskState.ABORTED || jobState == TaskState.FAILED -|| jobState == TaskState.FAILING || jobState == TaskState.TIMED_OUT) { - // Drop tasks if parent job is not in progress - paMap.put(pId, new PartitionAssignment(instance, TaskPartitionState.DROPPED.name())); - break; -} +case RUNNING: { + TaskPartitionState nextState = TaskPartitionState.RUNNING; + if (jobState == TaskState.TIMING_OUT) { +nextState = TaskPartitionState.TASK_ABORTED; + } else if (jobTgtState == TargetState.STOP) { +nextState = TaskPartitionState.STOPPED; + } else if (jobState == TaskState.ABORTED || jobState == TaskState.FAILED + || jobState == TaskState.FAILING || jobState == TaskState.TIMED_OUT) { +// Drop tasks if parent job is not in progress +paMap.put(pId, new PartitionAssignment(instance, TaskPartitionState.DROPPED.name())); +break; + } -paMap.put(pId, new PartitionAssignment(instance, nextState.name())); -assignedPartitions.add(pId); -if (LOG.isDebugEnabled()) { - LOG.debug(String.format("Setting task partition %s state to %s on instance %s.", pName, - nextState, instance)); -} + paMap.put(pId, new PartitionAssignment(instance, nextState.name())); + assignedPartitions.add(pId); + if (LOG.isDebugEnabled()) { +LOG.debug(String.format("Setting task partition %s state to %s on instance %s.", pName, +nextState, instance)); } +} break; case STOPPED: { // TODO: This case statement migh
[3/3] helix git commit: [HELIX-778] TASK: Fix a race condition in updatePreviousAssignedTasksStatus
[HELIX-778] TASK: Fix a race condition in updatePreviousAssignedTasksStatus It was observed that TestUnregisteredCommand is very unstable. The reason was identified to be a race condition where when a task fails, sometimes a pending message for that task (from INIT to RUNNING) wasn't being cleaned up on time, so AbstractTaskDispatcher's updatePreviousAssignedTasksStatus would try to process that message and skip the status update of that task (like updating its status and NUM_ATTEMPTS field in JobContext). A short, temporary fix is to call markPartitionError() prior to checking the pending message, but over the long haul, we would need to revisit the task status update's design here to avoid this type of race conditions. Changelist: 1. Move markPartitionError() up before checking for a pending message on the task 2. Fix TestUnregisteredCommand's instability Project: http://git-wip-us.apache.org/repos/asf/helix/repo Commit: http://git-wip-us.apache.org/repos/asf/helix/commit/ceba1a55 Tree: http://git-wip-us.apache.org/repos/asf/helix/tree/ceba1a55 Diff: http://git-wip-us.apache.org/repos/asf/helix/diff/ceba1a55 Branch: refs/heads/master Commit: ceba1a55ae351090144c001324f908f2364212a4 Parents: 5d24ed5 Author: Hunter Lee Authored: Wed Oct 31 17:20:37 2018 -0700 Committer: Hunter Lee Committed: Wed Oct 31 17:20:37 2018 -0700 -- .../apache/helix/task/AbstractTaskDispatcher.java| 15 --- .../integration/task/TestUnregisteredCommand.java| 3 ++- 2 files changed, 14 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/helix/blob/ceba1a55/helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java -- diff --git a/helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java b/helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java index aa72f2d..cbf9fb8 100644 --- a/helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java +++ b/helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java @@ -67,6 +67,16 @@ public abstract class AbstractTaskDispatcher { TaskPartitionState currState = updateJobContextAndGetTaskCurrentState(currStateOutput, jobResource, pId, pName, instance, jobCtx); +// This avoids a race condition in the case that although currentState is in the following +// error condition, the pending message (INIT->RUNNNING) might still be present. +// This is undesirable because this prevents JobContext from getting the proper update of +// fields including task state and task's NUM_ATTEMPTS +if (currState == TaskPartitionState.ERROR || currState == TaskPartitionState.TASK_ERROR +|| currState == TaskPartitionState.TIMED_OUT +|| currState == TaskPartitionState.TASK_ABORTED) { + markPartitionError(jobCtx, pId, currState, true); +} + // Check for pending state transitions on this (partition, instance). If there is a pending // state transition, we prioritize this pending state transition and set the assignment from // this pending state transition, essentially "waiting" until this pending message clears @@ -197,7 +207,6 @@ public abstract class AbstractTaskDispatcher { "Task partition %s has error state %s with msg %s. Marking as such in rebalancer context.", pName, currState, jobCtx.getPartitionInfo(pId))); } - markPartitionError(jobCtx, pId, currState, true); // The error policy is to fail the task as soon a single partition fails for a specified // maximum number of attempts or task is in ABORTED state. // But notice that if job is TIMED_OUT, aborted task won't be treated as fail and won't @@ -239,7 +248,6 @@ public abstract class AbstractTaskDispatcher { // Also release resources for these tasks assignableInstance.release(taskConfig, quotaType); - } else if (jobState == TaskState.IN_PROGRESS && (jobTgtState != TargetState.STOP && jobTgtState != TargetState.DELETE)) { // Job is in progress, implying that tasks are being re-tried, so set it to RUNNING @@ -940,7 +948,8 @@ public abstract class AbstractTaskDispatcher { private long getTimeoutTime(long startTime, long timeoutPeriod) { return (timeoutPeriod == TaskConstants.DEFAULT_NEVER_TIMEOUT -|| timeoutPeriod > Long.MAX_VALUE - startTime) // check long overflow +|| timeoutPeriod > Long.MAX_VALUE - startTime) +// check long overflow ? TaskConstants.DEFAULT_NEVER_TIMEOUT : startTime + timeoutPeriod; } http://git-wip-us.apache.org/repos/asf/helix/blob/ceba1a55/helix
helix git commit: Fix log argument order
Repository: helix Updated Branches: refs/heads/master ceba1a55a -> 89f351558 Fix log argument order Project: http://git-wip-us.apache.org/repos/asf/helix/repo Commit: http://git-wip-us.apache.org/repos/asf/helix/commit/89f35155 Tree: http://git-wip-us.apache.org/repos/asf/helix/tree/89f35155 Diff: http://git-wip-us.apache.org/repos/asf/helix/diff/89f35155 Branch: refs/heads/master Commit: 89f351558734b4bb95019452e4c58d6befeed0dd Parents: ceba1a5 Author: Junkai Xue Authored: Tue Oct 9 11:41:08 2018 -0700 Committer: Junkai Xue Committed: Wed Oct 31 17:28:04 2018 -0700 -- helix-core/src/main/java/org/apache/helix/task/TaskDriver.java | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/helix/blob/89f35155/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java -- diff --git a/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java b/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java index e675c86..27670e9 100644 --- a/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java +++ b/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java @@ -420,8 +420,8 @@ public class TaskDriver { Set allNodes = jobDag.getAllNodes(); if (capacity > 0 && allNodes.size() + jobConfigs.size() >= capacity) { throw new IllegalStateException(String - .format("Queue %s already reaches its max capacity %f, failed to add %s", capacity, - queue, jobs.toString())); + .format("Queue %s already reaches its max capacity %f, failed to add %s", queue, + capacity, jobs.toString())); } String lastNodeName = null;
[jira] [Commented] (HELIX-776) REST2.0: Add delete command to updateInstanceConfig
[ https://issues.apache.org/jira/browse/HELIX-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670953#comment-16670953 ] Hudson commented on HELIX-776: -- FAILURE: Integrated in Jenkins build helix #1561 (See [https://builds.apache.org/job/helix/1561/]) [HELIX-776] REST2.0: Add delete command to updateInstanceConfig (hulee: rev 6090732be6b88863017a93106fa692dc7350520b) * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/TestInstanceAccessor.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/InstanceAccessor.java * (edit) helix-core/src/main/java/org/apache/helix/ConfigAccessor.java > REST2.0: Add delete command to updateInstanceConfig > --- > > Key: HELIX-776 > URL: https://issues.apache.org/jira/browse/HELIX-776 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > For instance configs, REST2.0 did not expose the REST API for deletion of > fields. This RB adds update and delete commands to updateInstanceConfig and > an integration test thereof. Changelist: 1. Add delete command to > updateInstanceConfig in InstanceAccessor 2. Add integration tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-778) TASK: Fix a race condition in updatePreviousAssignedTasksStatus
[ https://issues.apache.org/jira/browse/HELIX-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670955#comment-16670955 ] Hudson commented on HELIX-778: -- FAILURE: Integrated in Jenkins build helix #1561 (See [https://builds.apache.org/job/helix/1561/]) [HELIX-778] TASK: Fix a race condition in (hulee: rev ceba1a55ae351090144c001324f908f2364212a4) * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestUnregisteredCommand.java * (edit) helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java > TASK: Fix a race condition in updatePreviousAssignedTasksStatus > --- > > Key: HELIX-778 > URL: https://issues.apache.org/jira/browse/HELIX-778 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > It was observed that TestUnregisteredCommand is very unstable. The reason was > identified to be a race condition where when a task fails, sometimes a > pending message for that task (from INIT to RUNNING) wasn't being cleaned up > on time, so AbstractTaskDispatcher's updatePreviousAssignedTasksStatus would > try to process that message and skip the status update of that task (like > updating its status and NUM_ATTEMPTS field in JobContext). > A short, temporary fix is to call markPartitionError() prior to checking the > pending message, but over the long haul, we would need to revisit the task > status update's design here to avoid this type of race conditions. > Changelist: > 1. Move markPartitionError() up before checking for a pending message on the > task > 2. Fix TestUnregisteredCommand's instability -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-777) TASK: Handle null currentState for unscheduled tasks
[ https://issues.apache.org/jira/browse/HELIX-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670954#comment-16670954 ] Hudson commented on HELIX-777: -- FAILURE: Integrated in Jenkins build helix #1561 (See [https://builds.apache.org/job/helix/1561/]) [HELIX-777] TASK: Handle null currentState for unscheduled tasks (hulee: rev 5d24ed544898ff69f289f54be71a04413735d118) * (edit) helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java > TASK: Handle null currentState for unscheduled tasks > > > Key: HELIX-777 > URL: https://issues.apache.org/jira/browse/HELIX-777 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > It was observed that when a workflow is submitted and the Controller attempts > to schedule its tasks, ZK read fails to read the appropriate job's context, > causing the job to be stuck in an unscheduled state. The job remained > unscheduled because it had no currentStates, and its job context did not > contain any assignment/state information. This RB fixes such stuck states by > detecting null currentStates. > Changelist: > 1. Check if currentState is null and if it is, manually assign an INIT state -- This message was sent by Atlassian JIRA (v7.6.3#76005)