[jira] [Commented] (HELIX-816) new Date.getTime() can be changed to System.currentTimeMillis()
[ https://issues.apache.org/jira/browse/HELIX-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816453#comment-16816453 ] Hudson commented on HELIX-816: -- FAILURE: Integrated in Jenkins build helix #1620 (See [https://builds.apache.org/job/helix/1620/]) HELIX-816 use System.currentTimeMillis() (narendly: rev efef0dbb3e2020546686ca617c03c59330ba9418) * (edit) helix-core/src/main/java/org/apache/helix/messaging/AsyncCallback.java * (edit) helix-core/src/main/java/org/apache/helix/messaging/handling/HelixStateTransitionHandler.java > new Date.getTime() can be changed to System.currentTimeMillis() > --- > > Key: HELIX-816 > URL: https://issues.apache.org/jira/browse/HELIX-816 > Project: Apache Helix > Issue Type: Bug >Reporter: bd2019us >Priority: Major > Labels: patch > Attachments: 1.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Hello, > I found that System.currentTimeMillis() can be used instead of new > Date.getTime(). > Since new Date() is a thin wrapper of light method > System.currentTimeMillis(). The performance will be greatly damaged if it is > invoked too much times. > According to my local testing at the same environment, > System.currentTimeMillis() can achieve a speedup to 5 times (435 ms vs 2073 > ms), when these two methods are invoked 5,000,000 times. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-815) File.mkdir() may fail and cause crash.
[ https://issues.apache.org/jira/browse/HELIX-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807114#comment-16807114 ] Hudson commented on HELIX-815: -- FAILURE: Integrated in Jenkins build helix #1619 (See [https://builds.apache.org/job/helix/1619/]) [HELIX-815] fix bug to avoid potential crash (jiajunwang: rev f4b85691441d30fade4b9d9a3f85930399e7dbf7) * (edit) helix-core/src/main/java/org/apache/helix/tools/commandtools/ZkGrep.java > File.mkdir() may fail and cause crash. > -- > > Key: HELIX-815 > URL: https://issues.apache.org/jira/browse/HELIX-815 > Project: Apache Helix > Issue Type: Bug > Components: helix-core >Reporter: bd2019us >Priority: Major > Labels: pull-request-available > Attachments: 1.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Location: > helix-core/src/main/java/org/apache/helix/tools/commandtools/ZkGrep.java: 584 > The File.mkdir() API requires the parent directory have to exist, otherwise a > failure will occur. To ensure the safety of file creation, the File.mkdirs() > is preferred as it will create the parent directory as well if needed. This > is important as if it fail, program may potentially crash. Besides, mkdirs() > causes no extra overhead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-814) HELIX: Add back ClusterDataCache for backward-compatibility
[ https://issues.apache.org/jira/browse/HELIX-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781130#comment-16781130 ] Hudson commented on HELIX-814: -- FAILURE: Integrated in Jenkins build helix #1612 (See [https://builds.apache.org/job/helix/1612/]) [HELIX-814] HELIX: Add back ClusterDataCache for backward-compatibility (narendly: rev e8fb0ad332cf9371ed7b7a084961c0d637ba5207) * (add) helix-core/src/main/java/org/apache/helix/controller/stages/ClusterDataCache.java > HELIX: Add back ClusterDataCache for backward-compatibility > --- > > Key: HELIX-814 > URL: https://issues.apache.org/jira/browse/HELIX-814 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > It was discovered that removing ClusterDataCache and changing public > interfaces (RebalanceStrategy, Rebalancer) caused backward-incompatibility. > This diff aims to solve this issue by creating a backward-compatibie > ClusterDataCache (deprecated). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-810) HELIX: Fix NPE in InstanceMessagesCache
[ https://issues.apache.org/jira/browse/HELIX-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778871#comment-16778871 ] Hudson commented on HELIX-810: -- FAILURE: Integrated in Jenkins build helix #1607 (See [https://builds.apache.org/job/helix/1607/]) [HELIX-810] HELIX: Fix NPE in InstanceMessagesCache (narendly: rev c1f9af5b4ffb88a4893ac04890690946e97082e4) * (edit) helix-core/src/main/java/org/apache/helix/common/caches/InstanceMessagesCache.java > HELIX: Fix NPE in InstanceMessagesCache > --- > > Key: HELIX-810 > URL: https://issues.apache.org/jira/browse/HELIX-810 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > It was observed that InstanceMessagesCache was throwing an NPE when it tries > to setRelayTime(). This is likely because some relay messages have target > instances that are no longer live (thus not in liveInstanceMap). > InstanceMessagesCache must handle this gracefully by skipping the operation. > We do not delete these msgs right away because the instance may come back > alive. Otherwise, after some time has passed, the msg will get expired by the > Controller and be removed. > Changelist; > 1. Add a try-catch block > 2. Improve logging -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-812) HELIX: Fix maintenance history bug
[ https://issues.apache.org/jira/browse/HELIX-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778873#comment-16778873 ] Hudson commented on HELIX-812: -- FAILURE: Integrated in Jenkins build helix #1607 (See [https://builds.apache.org/job/helix/1607/]) [HELIX-812] HELIX: Fix maintenance history bug (narendly: rev 4ecd923f279a53554fc66e9a71356af2f125c451) * (edit) helix-core/src/test/java/org/apache/helix/integration/controller/TestClusterMaintenanceMode.java * (edit) helix-core/src/main/java/org/apache/helix/controller/dataproviders/BaseControllerDataProvider.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/MaintenanceRecoveryStage.java > HELIX: Fix maintenance history bug > -- > > Key: HELIX-812 > URL: https://issues.apache.org/jira/browse/HELIX-812 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > There was a bug in maintenance history where when the cluster exits > maintenance mode automatically, it would record the exit action twice in > history. This is because each pipeline is designed to run > MaintenanceRecoveryStage twice. > Changelist: > 1. Add a flag so that if maintenanceSignal has been changed, just return > from MaintenanceRecoveryStage -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-809) TEST: Fix unstable TestClusterInMaintenanceModeWhenReachingMaxPartition
[ https://issues.apache.org/jira/browse/HELIX-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778870#comment-16778870 ] Hudson commented on HELIX-809: -- FAILURE: Integrated in Jenkins build helix #1607 (See [https://builds.apache.org/job/helix/1607/]) [HELIX-809] TEST: Fix unstable (narendly: rev ae586ac6f511ebed7bb01e0bec61f526b7d4b178) * (edit) helix-core/src/test/java/org/apache/helix/integration/rebalancer/TestClusterInMaintenanceModeWhenReachingMaxPartition.java > TEST: Fix unstable TestClusterInMaintenanceModeWhenReachingMaxPartition > --- > > Key: HELIX-809 > URL: https://issues.apache.org/jira/browse/HELIX-809 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > The pause for this was too short so the test was occasionally failing. This > RB fixes this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-811) HELIX: Only log relayMsg if it doesn't exist
[ https://issues.apache.org/jira/browse/HELIX-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778872#comment-16778872 ] Hudson commented on HELIX-811: -- FAILURE: Integrated in Jenkins build helix #1607 (See [https://builds.apache.org/job/helix/1607/]) [HELIX-811] HELIX: Only log relayMsg if it doesn't exist (narendly: rev 7daecd030ca6555f4a7db645761122142c9188d1) * (edit) helix-core/src/main/java/org/apache/helix/common/caches/InstanceMessagesCache.java > HELIX: Only log relayMsg if it doesn't exist > > > Key: HELIX-811 > URL: https://issues.apache.org/jira/browse/HELIX-811 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > This log was flooding our log files. We need to change it so that relay > messages only get logged once. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-808) TASK: Fix double-booking of tasks with task CurrentStates
[ https://issues.apache.org/jira/browse/HELIX-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778869#comment-16778869 ] Hudson commented on HELIX-808: -- FAILURE: Integrated in Jenkins build helix #1607 (See [https://builds.apache.org/job/helix/1607/]) [HELIX-808] TASK: Fix double-booking of tasks with task CurrentStates (narendly: rev 70b63558797d193164a08bc73eb99816cf3a6094) * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestNoDoubleAssign.java * (edit) helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java * (edit) helix-core/src/main/java/org/apache/helix/task/JobDispatcher.java > TASK: Fix double-booking of tasks with task CurrentStates > - > > Key: HELIX-808 > URL: https://issues.apache.org/jira/browse/HELIX-808 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > It was observed that TestNoDoubleAssign was failing intermittently. Upon > debugging with more detailed logs, there was a race condition between newly > starting tasks and dropping tasks. To prevent this, dropping state > transitions will be prioritized and prevInstanceToTaskAssignment will be > built from CurrentStates. This is needed to make sure the right number of > tasks are assigned every task pipeline and dropping transitions happen right > away. > > Changelist: > 1\. Change the logic for generating prevInstToTaskAssignment so that it's > based on CurrentState > 2\. Add a special check for not updating task partition state upon > Participant connection loss > 3\. TestNoDoubleAssign passes consistently > 4. Fix TestNoDoubleAssign so that there won't be any thread leak -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-806) HELIX: Modify endpoints for instrumenting maintenance with custom fields
[ https://issues.apache.org/jira/browse/HELIX-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778867#comment-16778867 ] Hudson commented on HELIX-806: -- FAILURE: Integrated in Jenkins build helix #1607 (See [https://builds.apache.org/job/helix/1607/]) [HELIX-806] HELIX: Modify endpoints for instrumenting maintenance with (narendly: rev db33ae5dbe5c50c0db62d3abc828530bc114b0ea) * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/TestClusterAccessor.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/ClusterAccessor.java > HELIX: Modify endpoints for instrumenting maintenance with custom fields > > > Key: HELIX-806 > URL: https://issues.apache.org/jira/browse/HELIX-806 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > We want to use content for users to input their KV mappings as a JSON string. > Changelist: > 1. Modify enable/disableMaintenanceMode endpoint logic > 2. Modify tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-807) REST: Add get maintenance signal endpoint
[ https://issues.apache.org/jira/browse/HELIX-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778868#comment-16778868 ] Hudson commented on HELIX-807: -- FAILURE: Integrated in Jenkins build helix #1607 (See [https://builds.apache.org/job/helix/1607/]) [HELIX-807] REST: Add get maintenance signal endpoint (narendly: rev a827f8fabd0eb6971aebe771c2d21c9b53c97214) * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/AbstractResource.java * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/TestClusterAccessor.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/ClusterAccessor.java > REST: Add get maintenance signal endpoint > - > > Key: HELIX-807 > URL: https://issues.apache.org/jira/browse/HELIX-807 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > Changelist: > 1. Add get maintenance signal endpoint > 2. Add a test -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-805) Implementation on HelixAdmin to check if cluster in maintenance mode
[ https://issues.apache.org/jira/browse/HELIX-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778513#comment-16778513 ] Hudson commented on HELIX-805: -- FAILURE: Integrated in Jenkins build helix #1603 (See [https://builds.apache.org/job/helix/1603/]) [HELIX-805] Implementation on HelixAdmin to check if cluster in (ywang4: rev 150d12fefebbd60a1d989342f5e755d228a4dae0) * (edit) helix-core/src/main/java/org/apache/helix/HelixAdmin.java * (edit) helix-core/src/test/java/org/apache/helix/integration/controller/TestClusterMaintenanceMode.java * (edit) helix-core/src/test/java/org/apache/helix/mock/MockHelixAdmin.java * (edit) helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java > Implementation on HelixAdmin to check if cluster in maintenance mode > > > Key: HELIX-805 > URL: https://issues.apache.org/jira/browse/HELIX-805 > Project: Apache Helix > Issue Type: Improvement >Reporter: Yi Wang >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > AC: > * implementation of checking if cluster in maintenance mode in helixAdmin > interface > * Integration test added -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-804) [REST] Support customFields in enabling/disabling maintenance mode
[ https://issues.apache.org/jira/browse/HELIX-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778498#comment-16778498 ] Hudson commented on HELIX-804: -- FAILURE: Integrated in Jenkins build helix #1602 (See [https://builds.apache.org/job/helix/1602/]) [HELIX-804] REST: Support customFields in enabling/disabling maintenance (narendly: rev 07168eee784751661c298f888321c1493df1a46a) * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/TestClusterAccessor.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/ClusterAccessor.java > [REST] Support customFields in enabling/disabling maintenance mode > -- > > Key: HELIX-804 > URL: https://issues.apache.org/jira/browse/HELIX-804 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > To improve operability around maintenance mode, we want to allow users to > specify a reason or customFields (KV mappings) when they enable/disable > maintenance mode. > Changelist: > 1. Modify logic in ClusterAccessor so that the user could pass in > customFields QueryParam in a JSON string > 2. Add an integration test that verifies the fields have been set -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-798) HELIX: Implement auto-exit of maintenance mode
[ https://issues.apache.org/jira/browse/HELIX-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778492#comment-16778492 ] Hudson commented on HELIX-798: -- FAILURE: Integrated in Jenkins build helix #1602 (See [https://builds.apache.org/job/helix/1602/]) [HELIX-798] HELIX: Implement auto-exit of maintenance mode (narendly: rev 313affc2e020068a1e5e7c5e92224d3299b1a404) * (edit) helix-core/src/main/java/org/apache/helix/model/ClusterConfig.java * (edit) helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java * (edit) helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java * (edit) helix-core/src/main/java/org/apache/helix/model/MaintenanceSignal.java * (edit) helix-core/src/test/java/org/apache/helix/integration/controller/TestClusterMaintenanceMode.java * (edit) helix-core/src/test/java/org/apache/helix/mock/MockHelixAdmin.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java * (edit) helix-core/src/main/java/org/apache/helix/controller/dataproviders/BaseControllerDataProvider.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/IntermediateStateCalcStage.java * (edit) helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java * (edit) helix-core/src/main/java/org/apache/helix/HelixAdmin.java * (edit) helix-core/src/main/java/org/apache/helix/controller/pipeline/AsyncWorkerType.java * (add) helix-core/src/main/java/org/apache/helix/controller/stages/MaintenanceRecoveryStage.java > HELIX: Implement auto-exit of maintenance mode > -- > > Key: HELIX-798 > URL: https://issues.apache.org/jira/browse/HELIX-798 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > This diff contains the implementation and integration of a feature addition > for Helix: auto-exit of maintenance mode. > Changelist: > 1. BestPossibleCalcStage logic was modified so that it will use a new API > 2. IntermediateCalcStage logic was modified to check if the cluster is in > maintenance first > 3. enableMaintenance() API was deprecated and replaced with auto/manual APIs > while preserving backward-compatibility > 4. An async stage (MaintenanceRecoveryStage) was created and added to the > resource pipeline > 5. A series of integration tests were added for various exit/non-exit > scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-803) REST: Support GET of maintenance history
[ https://issues.apache.org/jira/browse/HELIX-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778497#comment-16778497 ] Hudson commented on HELIX-803: -- FAILURE: Integrated in Jenkins build helix #1602 (See [https://builds.apache.org/job/helix/1602/]) [HELIX-803] REST: Support GET of maintenance history (narendly: rev ebffa4693d5d2097ec6105698f2863a7ad1eda69) * (edit) helix-core/src/main/java/org/apache/helix/model/ControllerHistory.java * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/TestClusterAccessor.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/AbstractResource.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/ClusterAccessor.java > REST: Support GET of maintenance history > > > Key: HELIX-803 > URL: https://issues.apache.org/jira/browse/HELIX-803 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > History entries regarding cluster maintenance mode will be recorded in the > HISTORY ZNode as part of the auto-exit of maintenance mode project. In order > to make it accessible programmatically, we are going to support a REST > endpoint that fetches maintenance history entries. > Changelist: > 1. Add an endpoint for retrieving maintenance history > 2. Add integration tests for retrieving controller leadership history and > maintenance history -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-800) TASK: Fix Participant-side log
[ https://issues.apache.org/jira/browse/HELIX-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778494#comment-16778494 ] Hudson commented on HELIX-800: -- FAILURE: Integrated in Jenkins build helix #1602 (See [https://builds.apache.org/job/helix/1602/]) [HELIX-800] TASK: Fix Participant-side log (narendly: rev 3d7c162d6e10589e8079a0690b538473bf2645de) * (edit) helix-core/src/main/java/org/apache/helix/task/TaskStateModel.java > TASK: Fix Participant-side log > -- > > Key: HELIX-800 > URL: https://issues.apache.org/jira/browse/HELIX-800 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > Previously, the log did not contain the instance name, which makes it not > very useful. This RB fixes this. > Changelist: > 1. Improve the log message -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-801) HELIX: Implement maintenance history for maintenance mode
[ https://issues.apache.org/jira/browse/HELIX-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778495#comment-16778495 ] Hudson commented on HELIX-801: -- FAILURE: Integrated in Jenkins build helix #1602 (See [https://builds.apache.org/job/helix/1602/]) [HELIX-801] HELIX: Implement maintenance history for maintenance mode (narendly: rev 4f863c3549631b5e9fccc6f40b513fff6fe435fa) * (edit) helix-core/src/test/java/org/apache/helix/integration/controller/TestClusterMaintenanceMode.java * (edit) helix-core/src/main/java/org/apache/helix/manager/zk/DistributedLeaderElection.java * (edit) helix-core/src/main/java/org/apache/helix/model/MaintenanceSignal.java * (edit) helix-core/src/main/java/org/apache/helix/PropertyKey.java * (delete) helix-core/src/main/java/org/apache/helix/model/LeaderHistory.java * (add) helix-core/src/main/java/org/apache/helix/model/ControllerHistory.java * (edit) helix-core/src/test/java/org/apache/helix/integration/controller/TestControllerHistory.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/ClusterAccessor.java * (edit) helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java * (edit) helix-core/src/main/java/org/apache/helix/PropertyPathBuilder.java * (edit) helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/MaintenanceRecoveryStage.java > HELIX: Implement maintenance history for maintenance mode > - > > Key: HELIX-801 > URL: https://issues.apache.org/jira/browse/HELIX-801 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > This diff implements maintenance history for entering and exiting maintenance > mode. > Changelist: > 1. Implement a separate DataUpdater for LeaderHistory ZNode update > 2. Implement recording of maintenance history in LeaderHistory ZNode > 3. Fix the bug where only the last few history entries are kept -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-799) TEST: Fix TestTaskRebalancerFailover
[ https://issues.apache.org/jira/browse/HELIX-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778493#comment-16778493 ] Hudson commented on HELIX-799: -- FAILURE: Integrated in Jenkins build helix #1602 (See [https://builds.apache.org/job/helix/1602/]) [HELIX-799] TEST: Fix TestTaskRebalancerFailover (narendly: rev 010d786cc2df1b3e2f4c5c9e1e4c04f77527b2a9) * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerFailover.java > TEST: Fix TestTaskRebalancerFailover > > > Key: HELIX-799 > URL: https://issues.apache.org/jira/browse/HELIX-799 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > This test was unstable in that the job in the JobQueue would never get > scheduled by the Controller. When you enqueue jobs to JobQueues, you need to > stop and ensure that the JobQueue is in STOPPED state first. This RB fixes > this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-802) HELIX: Filter out task resources from ExternalView computation
[ https://issues.apache.org/jira/browse/HELIX-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778496#comment-16778496 ] Hudson commented on HELIX-802: -- FAILURE: Integrated in Jenkins build helix #1602 (See [https://builds.apache.org/job/helix/1602/]) [HELIX-802] HELIX: Filter out task resources from ExternalView (narendly: rev d187ef4122a53c9a06e658aaf3139663b46c06ae) * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestDisableJobExternalView.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/ExternalViewComputeStage.java > HELIX: Filter out task resources from ExternalView computation > -- > > Key: HELIX-802 > URL: https://issues.apache.org/jira/browse/HELIX-802 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > Helix no longer will support ExternalViews for task-related resources. > Contexts serve the same purpose. > Changelist: > 1. Remove task resources from resourceMap > 2. Modify TestDisableExternalView -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-792) HELIX: fix typo in WorkflowDataProvider
[ https://issues.apache.org/jira/browse/HELIX-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777484#comment-16777484 ] Hudson commented on HELIX-792: -- FAILURE: Integrated in Jenkins build helix #1600 (See [https://builds.apache.org/job/helix/1600/]) [HELIX-792] HELIX: fix typo in WorkflowDataProvider (narendly: rev 4561496b4fb8a5791e371b202ff1a0e2d11ff73c) * (edit) helix-core/src/main/java/org/apache/helix/controller/dataproviders/WorkflowControllerDataProvider.java > HELIX: fix typo in WorkflowDataProvider > --- > > Key: HELIX-792 > URL: https://issues.apache.org/jira/browse/HELIX-792 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > HELIX: fix typo in WorkflowDataProvider -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-793) TASK: Make TaskAssigner honor instance constraints
[ https://issues.apache.org/jira/browse/HELIX-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777485#comment-16777485 ] Hudson commented on HELIX-793: -- FAILURE: Integrated in Jenkins build helix #1600 (See [https://builds.apache.org/job/helix/1600/]) [HELIX-793] TASK: Make TaskAssigner honor instance constraints (narendly: rev 2f22df63daccb86836fba411d65e78fbecf174ee) * (edit) helix-core/src/main/java/org/apache/helix/task/assigner/ThreadCountBasedTaskAssigner.java * (edit) helix-core/src/main/java/org/apache/helix/task/ThreadCountBasedTaskAssignmentCalculator.java * (edit) helix-core/src/main/java/org/apache/helix/task/assigner/TaskAssigner.java * (edit) helix-core/src/test/java/org/apache/helix/task/assigner/TestThreadCountBasedTaskAssigner.java * (edit) helix-core/src/main/java/org/apache/helix/task/AssignableInstanceManager.java > TASK: Make TaskAssigner honor instance constraints > -- > > Key: HELIX-793 > URL: https://issues.apache.org/jira/browse/HELIX-793 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > Previously, ThreadCountBasedTaskAssigner was assigning to all > AssignableInstances. This could potentially be problematic because some users > may wish to use InstanceGroupTags, in which case we must filter out instances > that do not have the appropriate tags. This RB adds a logic that helps > TaskAssigner honor such constraints. > > Changelist: > 1. TaskAssigner only assigns to AssignableInstances contained in eligible > instances > 2. Add a test for this logic change -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-794) TASK: Fix double-booking of tasks upon Participant disconnect
[ https://issues.apache.org/jira/browse/HELIX-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777486#comment-16777486 ] Hudson commented on HELIX-794: -- FAILURE: Integrated in Jenkins build helix #1600 (See [https://builds.apache.org/job/helix/1600/]) [HELIX-794] TASK: Fix double-booking of tasks upon Participant (narendly: rev ddb3690486a631efa9db704531781745d02ee546) * (add) helix-core/src/test/java/org/apache/helix/integration/task/TestNoDoubleAssign.java * (edit) helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java * (edit) helix-core/src/main/java/org/apache/helix/task/JobDispatcher.java > TASK: Fix double-booking of tasks upon Participant disconnect > - > > Key: HELIX-794 > URL: https://issues.apache.org/jira/browse/HELIX-794 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > It's been observed in production use cases that when there are transient > Participant connection issues, the Controller would fail to honor > maxNumberOfTasksPerInstance limit. That is to say, if the user wants only 1 > task from a job (limit is set to 1), Helix must assign up to 1 task onto an > instance. But upon short Participant disconnects, we saw 2 tasks in RUNNING > at the same time. > The cause for this is the incorrect calculation of jobConfigLimitation in > AbstractTaskDispatcher. This fixes this by utilizing a Map > (assignedPartitions) to calculate the correct number of tasks to assign. > Changelist: > 1. Modify an internal data structure (assignedPartitions) > 2. Fix the logic that calculates the number of tasks to assign -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-795) TASK: Drop tasks upon Participant reconnect
[ https://issues.apache.org/jira/browse/HELIX-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777487#comment-16777487 ] Hudson commented on HELIX-795: -- FAILURE: Integrated in Jenkins build helix #1600 (See [https://builds.apache.org/job/helix/1600/]) [HELIX-795] TASK: Drop tasks upon Participant reconnect (narendly: rev f9f89a79768156ef7341262cbb25a40d7dafeb1e) * (edit) helix-core/src/main/java/org/apache/helix/manager/zk/CurStateCarryOverUpdater.java * (edit) helix-core/src/main/java/org/apache/helix/task/JobDispatcher.java * (edit) helix-core/src/main/java/org/apache/helix/participant/HelixStateMachineEngine.java * (add) helix-core/src/test/java/org/apache/helix/integration/task/TestDropOnParticipantReset.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskStateModel.java > TASK: Drop tasks upon Participant reconnect > --- > > Key: HELIX-795 > URL: https://issues.apache.org/jira/browse/HELIX-795 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > This changes the default reset() behavior for tasks on Participants. > Previously, it would send all task partitions to INIT. After this change, the > task partitions will inherit the states from the previous session, and their > RequestedState will be set to DROPPED. Then the Controller will send messages > to drop the said task partitions so that there are no quota/resource leaks > for the number of tasks on Participants. > Changelist: > 1. Modify state transition logic so that drop state transitions messages > will be honored > 2. Modify CurrentState copy-over logic > 3. Add an integration test: TestDropOnParticipantReset -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-796) HELIX: Add fields to MaintenanceSignal
[ https://issues.apache.org/jira/browse/HELIX-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777488#comment-16777488 ] Hudson commented on HELIX-796: -- FAILURE: Integrated in Jenkins build helix #1600 (See [https://builds.apache.org/job/helix/1600/]) [HELIX-796] HELIX: Add fields to MaintenanceSignal (narendly: rev 4cd7269e933602de6029534cf20290d7955e0801) * (edit) helix-core/src/main/java/org/apache/helix/model/MaintenanceSignal.java > HELIX: Add fields to MaintenanceSignal > -- > > Key: HELIX-796 > URL: https://issues.apache.org/jira/browse/HELIX-796 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > We need to add extra fields to MaintenanceSignal in order for Helix > Controller to determine how the cluster was entered into maintenance mode. > Changelist: > 1. Add TRIGGERED_BY and TIMESTAMP fields and getters and setters -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-791) TASK2.0: Add RuntimeJobDag with job iterator functionality
[ https://issues.apache.org/jira/browse/HELIX-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777448#comment-16777448 ] Hudson commented on HELIX-791: -- FAILURE: Integrated in Jenkins build helix #1596 (See [https://builds.apache.org/job/helix/1596/]) [HELIX-791] TASK2.0: Add RuntimeJobDag with job iterator functionality (narendly: rev 429bc24a6e35a3a9ef8f10dc3b5c8c0553c6bf4c) * (add) helix-core/src/main/java/org/apache/helix/task/RuntimeJobDag.java * (add) helix-core/src/test/java/org/apache/helix/integration/task/TestRuntimeJobDag.java * (edit) helix-core/src/main/java/org/apache/helix/task/JobDag.java > TASK2.0: Add RuntimeJobDag with job iterator functionality > -- > > Key: HELIX-791 > URL: https://issues.apache.org/jira/browse/HELIX-791 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Job list iterator methods and underlying data structure were added to JobDag > to support retrieval of jobs by TaskDispatcher (to be implemented) for > improvement in Task Framework. > Changelist: > 1. Add RuntimeJobDag > 2. Add a unit test -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-790) REST2.0: Add support for updating IdealState
[ https://issues.apache.org/jira/browse/HELIX-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687000#comment-16687000 ] Hudson commented on HELIX-790: -- FAILURE: Integrated in Jenkins build helix #1573 (See [https://builds.apache.org/job/helix/1573/]) [HELIX-790] REST2.0: Add support for updating IdealState (narendly: rev abc6969d754e01c76278c266d08cc4e9fb80e910) * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/AbstractResource.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/ResourceAccessor.java * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/TestResourceAccessor.java * (edit) helix-core/src/main/java/org/apache/helix/HelixAdmin.java * (edit) helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java * (edit) helix-core/src/test/java/org/apache/helix/mock/MockHelixAdmin.java > REST2.0: Add support for updating IdealState > > > Key: HELIX-790 > URL: https://issues.apache.org/jira/browse/HELIX-790 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > There was a user request for a REST endpoint that allows users to > add/delete/modify fields in IdealState ZNodes. > Changelist: 1. Add updateResourceIdealState in ResourceAcessor 2. Add update > APIs in HelixAdmin 3. Add an integration test -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-789) REST2.0: Add support for update and delete for ResourceConfig
[ https://issues.apache.org/jira/browse/HELIX-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16686999#comment-16686999 ] Hudson commented on HELIX-789: -- FAILURE: Integrated in Jenkins build helix #1573 (See [https://builds.apache.org/job/helix/1573/]) [HELIX-789] REST2.0: Add support for update and delete for (narendly: rev 22fa03f3d8bb0863913f2a6614574443130e500a) * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/InstanceAccessor.java * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/TestResourceAccessor.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/ResourceAccessor.java > REST2.0: Add support for update and delete for ResourceConfig > - > > Key: HELIX-789 > URL: https://issues.apache.org/jira/browse/HELIX-789 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > Previous implementation of updateResourceConfig did not allow deletion of > fields in ResourceConfig in ZK. This RB refactors the REST endpoint. > Changelist: > 1. Add command support for updateResourceConfig > 2. Add integration tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-788) HELIX: Fix DefaultPipeline so that it doesn't rebalance task resources
[ https://issues.apache.org/jira/browse/HELIX-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673703#comment-16673703 ] Hudson commented on HELIX-788: -- FAILURE: Integrated in Jenkins build helix #1570 (See [https://builds.apache.org/job/helix/1570/]) [HELIX-788] HELIX: Fix DefaultPipeline so that it doesn't rebalance task (narendly: rev 59536d39c85d3535408a40a46a1a60a4105ee6e4) * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/ResourceComputationStage.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/IntermediateStateCalcStage.java * (edit) helix-core/src/test/java/org/apache/helix/controller/stages/TestStateTransitionPrirority.java > HELIX: Fix DefaultPipeline so that it doesn't rebalance task resources > -- > > Key: HELIX-788 > URL: https://issues.apache.org/jira/browse/HELIX-788 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > Helix CHO testing indicated that the default pipeline was rebalancing task > framework resources. This RB fixes this. > Changelist: > 1. Change resourceMap to resourceToRebalance, which separates generic and > task resources > 2. Make logger use LogUtil to distinguish two pipelines -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-786) TEST: Make TestQuotaBasedScheduling stable
[ https://issues.apache.org/jira/browse/HELIX-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673702#comment-16673702 ] Hudson commented on HELIX-786: -- FAILURE: Integrated in Jenkins build helix #1570 (See [https://builds.apache.org/job/helix/1570/]) [HELIX-786] TEST: Make TestQuotaBasedScheduling stable (narendly: rev bced0996ed65c9a886b5e04788e2cc1c88fc37b1) * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java [HELIX-786] TASK: Fix stuck tasks after Participant connection loss (narendly: rev dc25bac1ebdcddb08aaab2765abfe72008b06a31) * (edit) helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java > TEST: Make TestQuotaBasedScheduling stable > -- > > Key: HELIX-786 > URL: https://issues.apache.org/jira/browse/HELIX-786 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > Because recent changes caused the Controller to run slower, > TestQuotaBasedScheduling was being unstable. This RB fixes this. > Changelist: > 1. Use polling instead of Thread.sleep() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-785) Report helix latency instead of user latency during top state handoff
[ https://issues.apache.org/jira/browse/HELIX-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673669#comment-16673669 ] Hudson commented on HELIX-785: -- FAILURE: Integrated in Jenkins build helix #1569 (See [https://builds.apache.org/job/helix/1569/]) [HELIX-785] Record helix latency instead of user latency in top state (hrzhang: rev ccf263c9110fa5af54f1c1cabd8b5a2af64d473e) * (edit) helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java * (edit) helix-core/src/test/java/org/apache/helix/monitoring/mbeans/TestTopStateHandoffMetrics.java * (edit) helix-core/src/test/resources/TestTopStateHandoffMetrics.json * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/TopStateHandoffReportStage.java * (edit) helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ResourceMonitor.java > Report helix latency instead of user latency during top state handoff > - > > Key: HELIX-785 > URL: https://issues.apache.org/jira/browse/HELIX-785 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Currently we are reporting top state handoff user latency, but we should > report Helix latency instead. user should have their way of monitoring their > own state transitions. > AC: > 1. Implement reporting Helix latency for top state handoff and test it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-780) Support get/add rest api for workflow/job/task user content
[ https://issues.apache.org/jira/browse/HELIX-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672466#comment-16672466 ] Hudson commented on HELIX-780: -- FAILURE: Integrated in Jenkins build helix #1567 (See [https://builds.apache.org/job/helix/1567/]) [HELIX-780] add task user content related api and added more tests (hrzhang: rev 18aa67b6d5c703e5b938b2f915f52a6ca856e889) * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/TestJobAccessor.java * (add) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/TaskAccessor.java * (add) helix-rest/src/test/java/org/apache/helix/rest/server/TestTaskAccessor.java * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/TestWorkflowAccessor.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/JobAccessor.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/WorkflowAccessor.java * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/AbstractTestClass.java > Support get/add rest api for workflow/job/task user content > --- > > Key: HELIX-780 > URL: https://issues.apache.org/jira/browse/HELIX-780 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Need to support get/add rest api for workflow/job/task user content > AC: > * finish implementation > * test code -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-784) TASK: Fix a bug in getExpiredJobs
[ https://issues.apache.org/jira/browse/HELIX-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672390#comment-16672390 ] Hudson commented on HELIX-784: -- FAILURE: Integrated in Jenkins build helix #1566 (See [https://builds.apache.org/job/helix/1566/]) [HELIX-784] TASK: Fix a bug in getExpiredJobs (narendly: rev befb1036f8d8be2729a800d3dde88fc1362a6489) * (delete) helix-core/src/test/java/org/apache/helix/controller/stages/TestTaskPersistDataStage.java * (add) helix-core/src/test/java/org/apache/helix/controller/stages/TestTaskStage.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java > TASK: Fix a bug in getExpiredJobs > - > > Key: HELIX-784 > URL: https://issues.apache.org/jira/browse/HELIX-784 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > getExpiredJobs(), when the job config is null, would just continue instead of > adding it to expiredJobs so that the job cleanup/purge would be re-tried. > This could possibly cause purge failures to leave a lot of jobs un-purged > with just the job config missing in ZK. This RB fixes this. > Changelist: > 1. Add the job name to expiredJobs if the job config does not exist in ZK > 2. Add a more detailed description in the error log > 3. Add an integration test for two task-related stages: TaskPersistDataStage > and TaskGarbageCollectionStage in TestTaskStage.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-782) TASK: Make TaskDriver use ZKClient's create when creating workflows
[ https://issues.apache.org/jira/browse/HELIX-782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672389#comment-16672389 ] Hudson commented on HELIX-782: -- FAILURE: Integrated in Jenkins build helix #1566 (See [https://builds.apache.org/job/helix/1566/]) [HELIX-782] TASK: Make TaskDriver use ZKClient's create when creating (narendly: rev 3844ad60034b029f3bbd916f629a7969117c1b26) * (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskDriver.java * (add) helix-core/src/test/java/org/apache/helix/task/TestWorkflowCreation.java > TASK: Make TaskDriver use ZKClient's create when creating workflows > --- > > Key: HELIX-782 > URL: https://issues.apache.org/jira/browse/HELIX-782 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > TaskDriver should use create() but currently is using set(), which just > overwrites ZNodes that are in ZK. This is undesirable and we need to fix it, > especially in the wake of ZNode restructuring. > AC: > 1. Make TaskDriver use create() instead of set() > 2. Add an integration test: > TestWorkflowCreation:testWorkflowCreationNoDuplicates() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-780) Support get/add rest api for workflow/job/task user content
[ https://issues.apache.org/jira/browse/HELIX-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672230#comment-16672230 ] Hudson commented on HELIX-780: -- FAILURE: Integrated in Jenkins build helix #1564 (See [https://builds.apache.org/job/helix/1564/]) [HELIX-780] add get/add job user content rest api (hrzhang: rev a09a18ac55464c3e399800b4474ccb6e64d168ec) * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/JobAccessor.java * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/TestJobAccessor.java > Support get/add rest api for workflow/job/task user content > --- > > Key: HELIX-780 > URL: https://issues.apache.org/jira/browse/HELIX-780 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Need to support get/add rest api for workflow/job/task user content > AC: > * finish implementation > * test code -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-780) Support get/add rest api for workflow/job/task user content
[ https://issues.apache.org/jira/browse/HELIX-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672042#comment-16672042 ] Hudson commented on HELIX-780: -- FAILURE: Integrated in Jenkins build helix #1563 (See [https://builds.apache.org/job/helix/1563/]) [HELIX-780] add get/add user content for workflow rest api (hrzhang: rev 71e4b6a66af1ae56a3667d5f6f5ca7ac63080997) * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/WorkflowAccessor.java * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/AbstractTestClass.java * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/TestWorkflowAccessor.java > Support get/add rest api for workflow/job/task user content > --- > > Key: HELIX-780 > URL: https://issues.apache.org/jira/browse/HELIX-780 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Need to support get/add rest api for workflow/job/task user content > AC: > * finish implementation > * test code -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-779) Maintenance rebalancer should not clear preference list in ideal state
[ https://issues.apache.org/jira/browse/HELIX-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671995#comment-16671995 ] Hudson commented on HELIX-779: -- FAILURE: Integrated in Jenkins build helix #1562 (See [https://builds.apache.org/job/helix/1562/]) [HELIX-779] do not clean list field in maintenance rebalancer for new (hrzhang: rev bfaa8399529b6e63b307c1fbe60903c3ca08fbb1) * (edit) helix-core/src/test/java/org/apache/helix/integration/controller/TestClusterMaintenanceMode.java * (edit) helix-core/src/main/java/org/apache/helix/controller/rebalancer/MaintenanceRebalancer.java > Maintenance rebalancer should not clear preference list in ideal state > -- > > Key: HELIX-779 > URL: https://issues.apache.org/jira/browse/HELIX-779 > Project: Apache Helix > Issue Type: Bug > Components: helix-core >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Setting list fields to empty map will prevent newly added and initially > rebalanced resources during maintenance mode from getting re-balanced after > cluster exists maintenance mode. > The right thing to do is to clear every preference list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-777) TASK: Handle null currentState for unscheduled tasks
[ https://issues.apache.org/jira/browse/HELIX-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670954#comment-16670954 ] Hudson commented on HELIX-777: -- FAILURE: Integrated in Jenkins build helix #1561 (See [https://builds.apache.org/job/helix/1561/]) [HELIX-777] TASK: Handle null currentState for unscheduled tasks (hulee: rev 5d24ed544898ff69f289f54be71a04413735d118) * (edit) helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java > TASK: Handle null currentState for unscheduled tasks > > > Key: HELIX-777 > URL: https://issues.apache.org/jira/browse/HELIX-777 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > It was observed that when a workflow is submitted and the Controller attempts > to schedule its tasks, ZK read fails to read the appropriate job's context, > causing the job to be stuck in an unscheduled state. The job remained > unscheduled because it had no currentStates, and its job context did not > contain any assignment/state information. This RB fixes such stuck states by > detecting null currentStates. > Changelist: > 1. Check if currentState is null and if it is, manually assign an INIT state -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-776) REST2.0: Add delete command to updateInstanceConfig
[ https://issues.apache.org/jira/browse/HELIX-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670953#comment-16670953 ] Hudson commented on HELIX-776: -- FAILURE: Integrated in Jenkins build helix #1561 (See [https://builds.apache.org/job/helix/1561/]) [HELIX-776] REST2.0: Add delete command to updateInstanceConfig (hulee: rev 6090732be6b88863017a93106fa692dc7350520b) * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/TestInstanceAccessor.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/InstanceAccessor.java * (edit) helix-core/src/main/java/org/apache/helix/ConfigAccessor.java > REST2.0: Add delete command to updateInstanceConfig > --- > > Key: HELIX-776 > URL: https://issues.apache.org/jira/browse/HELIX-776 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > For instance configs, REST2.0 did not expose the REST API for deletion of > fields. This RB adds update and delete commands to updateInstanceConfig and > an integration test thereof. Changelist: 1. Add delete command to > updateInstanceConfig in InstanceAccessor 2. Add integration tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-778) TASK: Fix a race condition in updatePreviousAssignedTasksStatus
[ https://issues.apache.org/jira/browse/HELIX-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670955#comment-16670955 ] Hudson commented on HELIX-778: -- FAILURE: Integrated in Jenkins build helix #1561 (See [https://builds.apache.org/job/helix/1561/]) [HELIX-778] TASK: Fix a race condition in (hulee: rev ceba1a55ae351090144c001324f908f2364212a4) * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestUnregisteredCommand.java * (edit) helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java > TASK: Fix a race condition in updatePreviousAssignedTasksStatus > --- > > Key: HELIX-778 > URL: https://issues.apache.org/jira/browse/HELIX-778 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > It was observed that TestUnregisteredCommand is very unstable. The reason was > identified to be a race condition where when a task fails, sometimes a > pending message for that task (from INIT to RUNNING) wasn't being cleaned up > on time, so AbstractTaskDispatcher's updatePreviousAssignedTasksStatus would > try to process that message and skip the status update of that task (like > updating its status and NUM_ATTEMPTS field in JobContext). > A short, temporary fix is to call markPartitionError() prior to checking the > pending message, but over the long haul, we would need to revisit the task > status update's design here to avoid this type of race conditions. > Changelist: > 1. Move markPartitionError() up before checking for a pending message on the > task > 2. Fix TestUnregisteredCommand's instability -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-775) Task driver should support add/get task framework user content
[ https://issues.apache.org/jira/browse/HELIX-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670739#comment-16670739 ] Hudson commented on HELIX-775: -- FAILURE: Integrated in Jenkins build helix #1560 (See [https://builds.apache.org/job/helix/1560/]) [HELIX-775] consolidate user content related apis for task driver (hrzhang: rev b235c4ee5a82c5970d29e839317ea242813a58bc) * (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskDriver.java * (edit) helix-core/src/test/java/org/apache/helix/task/TestGetSetUserContentStore.java > Task driver should support add/get task framework user content > -- > > Key: HELIX-775 > URL: https://issues.apache.org/jira/browse/HELIX-775 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Task driver should support add/get task framework user content at > workflow/job/task levels > > AC: > * finish implementation > * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-775) Task driver should support add/get task framework user content
[ https://issues.apache.org/jira/browse/HELIX-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670726#comment-16670726 ] Hudson commented on HELIX-775: -- FAILURE: Integrated in Jenkins build helix #1559 (See [https://builds.apache.org/job/helix/1559/]) [HELIX-775] add task driver support for helix rest to add/get task (hrzhang: rev 7ec5313bccb679014d6a0605ee5d7184063e555e) * (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskDriver.java > Task driver should support add/get task framework user content > -- > > Key: HELIX-775 > URL: https://issues.apache.org/jira/browse/HELIX-775 > Project: Apache Helix > Issue Type: Task >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Task driver should support add/get task framework user content at > workflow/job/task levels > > AC: > * finish implementation > * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-773) Support getLastScheduledTaskTimestamp information in workflow rest api
[ https://issues.apache.org/jira/browse/HELIX-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670718#comment-16670718 ] Hudson commented on HELIX-773: -- FAILURE: Integrated in Jenkins build helix #1558 (See [https://builds.apache.org/job/helix/1558/]) [HELIX-773] add getLastScheduledTaskTimestamp information in workflow (hrzhang: rev 566d4f166473b477ea0db1cfba5d04c8f3d6bf30) * (add) helix-core/src/test/java/org/apache/helix/task/TestGetLastScheduledTaskExecInfo.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/WorkflowAccessor.java * (delete) helix-core/src/test/java/org/apache/helix/task/TestGetLastScheduledTaskTimestamp.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskDriver.java * (add) helix-core/src/main/java/org/apache/helix/task/TaskExecutionInfo.java * (edit) helix-rest/src/test/java/org/apache/helix/rest/server/TestWorkflowAccessor.java > Support getLastScheduledTaskTimestamp information in workflow rest api > -- > > Key: HELIX-773 > URL: https://issues.apache.org/jira/browse/HELIX-773 > Project: Apache Helix > Issue Type: Bug >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Support getLastScheduledTaskTimestamp information in workflow rest api -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-772) Support TaskDriver.addUserContent() api
[ https://issues.apache.org/jira/browse/HELIX-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670717#comment-16670717 ] Hudson commented on HELIX-772: -- FAILURE: Integrated in Jenkins build helix #1558 (See [https://builds.apache.org/job/helix/1558/]) [HELIX-772] add TaskDriver.addUserContent() api and related tests (hrzhang: rev 0c251bbf640206729755301c3dda734eea78343f) * (add) helix-core/src/test/java/org/apache/helix/task/TestGetSetUserContentStore.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestIndependentTaskRebalancer.java * (delete) helix-core/src/test/java/org/apache/helix/task/TestGetUserContentStore.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskDriver.java > Support TaskDriver.addUserContent() api > --- > > Key: HELIX-772 > URL: https://issues.apache.org/jira/browse/HELIX-772 > Project: Apache Helix > Issue Type: Bug >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Need to support add user content in task driver > > AC: > * implement APi > * add test > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-771) More detailed top state handoff metrics
[ https://issues.apache.org/jira/browse/HELIX-771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669482#comment-16669482 ] Hudson commented on HELIX-771: -- FAILURE: Integrated in Jenkins build helix #1557 (See [https://builds.apache.org/job/helix/1557/]) [HELIX-771] More detailed top state handoff metrics (hrzhang: rev 7e49f995e29ea200fcc42ce6af148ed521979f5c) * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/ClusterDataCache.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestIndependentTaskRebalancer.java * (edit) helix-core/src/test/resources/TestTopStateHandoffMetrics.json * (add) helix-core/src/main/java/org/apache/helix/controller/stages/MissingTopStateRecord.java * (edit) helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java * (edit) helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateComputationStage.java * (add) helix-core/src/main/java/org/apache/helix/controller/stages/TopStateHandoffReportStage.java * (edit) helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ResourceMonitor.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskDriver.java * (edit) helix-core/src/test/java/org/apache/helix/monitoring/mbeans/TestTopStateHandoffMetrics.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/TaskGarbageCollectionStage.java > More detailed top state handoff metrics > --- > > Key: HELIX-771 > URL: https://issues.apache.org/jira/browse/HELIX-771 > Project: Apache Helix > Issue Type: Bug > Components: helix-core >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > To define top state handoff SLA, we need some more detailed data: > * graceful top state handoff (i.e. disable instance / resource / etc, both > Helix and e2e latency) > * abrupt top state handoff (i.e. node crash) > AC: > - prepare metrics, test, code complete -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-770) HELIX: Fix a possible NPE in loadBalance in IntermediateStateCalcStage
[ https://issues.apache.org/jira/browse/HELIX-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667951#comment-16667951 ] Hudson commented on HELIX-770: -- FAILURE: Integrated in Jenkins build helix #1555 (See [https://builds.apache.org/job/helix/1555/]) [HELIX-770] HELIX: Fix a possible NPE in loadBalance in (hulee: rev cf010f90426003dab7e713945c2c9daa23ffed13) * (edit) helix-core/src/main/java/org/apache/helix/model/StateModelDefinition.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/IntermediateStateCalcStage.java > HELIX: Fix a possible NPE in loadBalance in IntermediateStateCalcStage > -- > > Key: HELIX-770 > URL: https://issues.apache.org/jira/browse/HELIX-770 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > In isLoadBalanceDownwardForAllReplicas() in IntermediateStateCalcStage, > statePriorityMap was throwing a NPE because the partition contained a replica > in ERROR state, and the map did not have an entry for it. To amend the issue, > Venice added the ERROR state in the state model with a priority, and Helix > added checks to prevent NPEs. Changelist: 1. Add containsKey checks in > isLoadBalanceDownwardForAllReplicas() 2. Make the Controller correctly log > all partitions with ERROR state replicas 3. Add HelixDefinedStates in > statePriorityList if not already added -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-767) TASK: Remove quotaType fields from Workflow and Job Beans
[ https://issues.apache.org/jira/browse/HELIX-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667876#comment-16667876 ] Hudson commented on HELIX-767: -- FAILURE: Integrated in Jenkins build helix #1552 (See [https://builds.apache.org/job/helix/1552/]) [HELIX-767] TASK: Remove quotaType fields from Workflow and Job Beans (hulee: rev 1d32172a53224019b29a6098378daf066ee28e80) * (edit) helix-core/src/main/java/org/apache/helix/task/beans/WorkflowBean.java * (edit) helix-core/src/main/java/org/apache/helix/task/beans/JobBean.java * (edit) helix-core/src/main/java/org/apache/helix/task/WorkflowConfig.java > TASK: Remove quotaType fields from Workflow and Job Beans > - > > Key: HELIX-767 > URL: https://issues.apache.org/jira/browse/HELIX-767 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > For a short while, we used quota type to denote the quota type of Task > Framework resources. However, we changed the design so that we are using the > workflowType and jobType fields respectively. There were places where the > quotaType field was left over in the codebase, and this RB cleans it up. > Changelist: > 1. Remove all quotaType fields from Bean classes > 2. Add the setting of workflowType in WorkflowBean when read into > Workflow.Builder -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-769) TASK2.0: Add PropertyKey APIs for new ZNode structure workflow/job paths
[ https://issues.apache.org/jira/browse/HELIX-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667878#comment-16667878 ] Hudson commented on HELIX-769: -- FAILURE: Integrated in Jenkins build helix #1552 (See [https://builds.apache.org/job/helix/1552/]) [HELIX-769] TASK2.0: Add PropertyKey APIs for new ZNode structure (hulee: rev 739adb0d6ed35d7281e1c4cbddcffda56a223689) * (edit) helix-core/src/main/java/org/apache/helix/PropertyKey.java * (edit) helix-core/src/main/java/org/apache/helix/PropertyType.java * (edit) helix-core/src/main/java/org/apache/helix/PropertyPathBuilder.java * (delete) helix-core/src/test/java/org/apache/helix/util/TestGetWorkflowContext.java * (add) helix-core/src/test/java/org/apache/helix/util/TestPropertyKeyGetPath.java > TASK2.0: Add PropertyKey APIs for new ZNode structure workflow/job paths > > > Key: HELIX-769 > URL: https://issues.apache.org/jira/browse/HELIX-769 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > As part of ZNode restructuring for Task Framework, we need a convenient way > to generate paths to read from and write to ZooKeeper. PropertyKey was > already being used for this purpose for the most part throughout Helix, so > these APIs were added in PropertyKey and PropertyPathBuilder. > Changelist: > 1. Add path generation APIs for Task Framework resources in PropertyKey > 2. Add a unit test for the new APIs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-768) TASK: Fix a bug in WorkflowAccessor
[ https://issues.apache.org/jira/browse/HELIX-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667877#comment-16667877 ] Hudson commented on HELIX-768: -- FAILURE: Integrated in Jenkins build helix #1552 (See [https://builds.apache.org/job/helix/1552/]) [HELIX-768] TASK: Fix a bug in WorkflowAccessor (hulee: rev 65bb35090a90679cc0973d142a1d9a27d1522bbe) * (edit) helix-core/src/main/java/org/apache/helix/task/WorkflowConfig.java * (edit) helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/WorkflowAccessor.java > TASK: Fix a bug in WorkflowAccessor > --- > > Key: HELIX-768 > URL: https://issues.apache.org/jira/browse/HELIX-768 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > There was a bug in WorkflowAccessor where a JSON was submitted to for PUT > requests, it would created a WorkflowConfig based on the JSON but fails to > create a workflow with the config. This was preventing users to create > workflows via REST APIs properly. > Changelist: > 1. Ensure that the configs submitted are reflected in the workflow being > created -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-765) [TASK] Build quota profile from scratch every rebalance
[ https://issues.apache.org/jira/browse/HELIX-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667536#comment-16667536 ] Hudson commented on HELIX-765: -- FAILURE: Integrated in Jenkins build helix #1547 (See [https://builds.apache.org/job/helix/1547/]) [HELIX-765] TASK: Build quota profile from scratch every rebalance (hulee: rev 930a4b7ae7eb63be0a751a593ba630ae55fb2cfb) * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/ClusterDataCache.java > [TASK] Build quota profile from scratch every rebalance > --- > > Key: HELIX-765 > URL: https://issues.apache.org/jira/browse/HELIX-765 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > It has been reported that instances have a full quota despite no tasks > existing in their CURRENTSTATES. The cause of this is not clear, so making > ClusterDataCache trigger a refresh of all AssignableInstances will ensure > that there aren't situations where it looks like there has been a thread > leak. Optimizations will be implemented if necessary. Changelist: 1. Make > AssignableInstanceManager build all AssignableInstances from scratch every > rebalance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-766) [TASK] Add logging functionality in AssignableInstanceManager
[ https://issues.apache.org/jira/browse/HELIX-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667537#comment-16667537 ] Hudson commented on HELIX-766: -- FAILURE: Integrated in Jenkins build helix #1547 (See [https://builds.apache.org/job/helix/1547/]) [HELIX-766] TASK: Add logging functionality in AssignableInstanceManager (hulee: rev 5033785c231af363953367f65f77513911b753f5) * (edit) helix-core/src/main/java/org/apache/helix/task/AssignableInstanceManager.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/ClusterDataCache.java > [TASK] Add logging functionality in AssignableInstanceManager > - > > Key: HELIX-766 > URL: https://issues.apache.org/jira/browse/HELIX-766 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > In order to debug task-related inquiries and issues, we realized that it > would be very helpful if we logged there was a log recording the current > quota capacity of all AssignableInstances. This is for cases where we see > jobs whose tasks are not getting assigned so that we could quickly rule out > the possibility of bugs in quota-based scheduling. > Changelist: > 1. Add a method that logs current quota profile in a JSON format with an > option flag of only displaying when there are quota types whose capacities > are full > 2. Add info logs in AssignableInstanceManager -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-764) [TASK] Fix LiveInstanceCurrentState change flag
[ https://issues.apache.org/jira/browse/HELIX-764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667535#comment-16667535 ] Hudson commented on HELIX-764: -- FAILURE: Integrated in Jenkins build helix #1547 (See [https://builds.apache.org/job/helix/1547/]) [HELIX-764] TASK: Fix LiveInstanceCurrentState change flag (hulee: rev d33d9efea25fe9d29e84a4ce7614b544ef2d) * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/ClusterDataCache.java > [TASK] Fix LiveInstanceCurrentState change flag > --- > > Key: HELIX-764 > URL: https://issues.apache.org/jira/browse/HELIX-764 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > Previously, existsLiveInstanceOrCurrentStateChange was getting reset in > ClusterDataCache when its getter was called. This was problematic because if > there were multiple jobs or multiple workflows, whoever calls this getter > would get the correct flag value, and the ensuing callers would get a false > because the flag would have been reset. This RB fixes that bug by reseting > the flat right in the beginning of refresh() call in ClusterDataCache, which > allows all callers during that pipeline would get the same, correct value. > Changelist: > 1. Change the getter so that it does not reset the flag; instead, reset the > flag in the beginning of refresh() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-762) [TASK] Change LOG mode from info to debug
[ https://issues.apache.org/jira/browse/HELIX-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667533#comment-16667533 ] Hudson commented on HELIX-762: -- FAILURE: Integrated in Jenkins build helix #1547 (See [https://builds.apache.org/job/helix/1547/]) [HELIX-762] TASK: Change LOG mode from info to debug (hulee: rev e7b960c22896c08337292d20f674f20a7f1391d0) * (edit) helix-core/src/main/java/org/apache/helix/task/assigner/AssignableInstance.java * (edit) helix-core/src/main/java/org/apache/helix/task/AssignableInstanceManager.java > [TASK] Change LOG mode from info to debug > - > > Key: HELIX-762 > URL: https://issues.apache.org/jira/browse/HELIX-762 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > In production, it was observed that some users were running thousands of > tasks, and since AssignableInstance leaves a line of log for each task > assigned or released, the amount of log that was being generated was too > much, and it was too verbose. > Changelist: > 1. Change the logging mode from info to debug in AssignableInstance and > AssignableInstanceManager -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-753) Record top state handoff finished in single cluster data cache refresh
[ https://issues.apache.org/jira/browse/HELIX-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664402#comment-16664402 ] Hudson commented on HELIX-753: -- FAILURE: Integrated in Jenkins build helix #1545 (See [https://builds.apache.org/job/helix/1545/]) [HELIX-753] Record top state handoff finished in single cluster data (hrzhang: rev 67ff66b4897309c785b8b42863e95734eba81aab) * (edit) helix-core/src/test/java/org/apache/helix/monitoring/mbeans/TestTopStateHandoffMetrics.java * (edit) helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/ClusterEvent.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateComputationStage.java * (edit) helix-core/src/test/resources/TestTopStateHandoffMetrics.json * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java > Record top state handoff finished in single cluster data cache refresh > -- > > Key: HELIX-753 > URL: https://issues.apache.org/jira/browse/HELIX-753 > Project: Apache Helix > Issue Type: Bug >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Currently we are calculating top state handoff duration by doing the > following: > - record missing top state when we see a top state missing > - record top state come back when we see it come back > - report top state handoff duration > This is perfectly fine for non-P2P state transitions as the entire top state > handoff process will always finish for >= 2 pipeline runs. However, for P2P > enabled clusters, top state handoff are quick, and if it is quicker than > cluster data refresh stage latency, we will lose a lot of short top state > handoffs, which make the number miserable on ingraph. > We need to revise top state handoff metrics implementation so we don't lose > data point statistically (i.e. we are losing all short handoffs now). > AC: > - revise impl so we catch those short top state hand-offs > - write new tests to catch the fix if needed -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-751) TASK: Fix AssignableInstanceComparator so that it sorts unsupported quota types
[ https://issues.apache.org/jira/browse/HELIX-751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624163#comment-16624163 ] Hudson commented on HELIX-751: -- FAILURE: Integrated in Jenkins build helix #1538 (See [https://builds.apache.org/job/helix/1538/]) [HELIX-751] TASK: Fix AssignableInstanceComparator so that it sorts (hulee: rev 11b721350091247cd3aa9716345ebdd53199ab53) * (edit) helix-core/src/main/java/org/apache/helix/task/assigner/ThreadCountBasedTaskAssigner.java > TASK: Fix AssignableInstanceComparator so that it sorts unsupported quota > types > --- > > Key: HELIX-751 > URL: https://issues.apache.org/jira/browse/HELIX-751 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > Currently, if the quota type does not exist, it will not sort > AssignableInstances based on availability. This does not cause immediate > problems, but it would be nice to have them sorted because we now allow > unsupported quota types run as DEFAULT type. > Changelist: > 1. Comparator sorts AssignableInstances in a PriorityQueue by DEFAULT type's > availability when the quota type given is unsupported -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-752) Add missing shutdown for RoutingTableProvider
[ https://issues.apache.org/jira/browse/HELIX-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624140#comment-16624140 ] Hudson commented on HELIX-752: -- FAILURE: Integrated in Jenkins build helix #1537 (See [https://builds.apache.org/job/helix/1537/]) [HELIX-752] Add missing shutdown for RoutingTableProvider (lxia: rev 580f1facc349cfcaa8cab93de3409834dd592ac4) * (edit) helix-core/src/test/java/org/apache/helix/integration/spectator/TestRoutingTableProvider.java > Add missing shutdown for RoutingTableProvider > - > > Key: HELIX-752 > URL: https://issues.apache.org/jira/browse/HELIX-752 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > Changelist: > 1. Add a missing shutdown() call to avoid having a background thread keep > printing out error messages -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-745) [TASK] Make AssignableInstanceManager listen on data changes to update AssignableInstances
[ https://issues.apache.org/jira/browse/HELIX-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554969#comment-16554969 ] Hudson commented on HELIX-745: -- FAILURE: Integrated in Jenkins build helix #1527 (See [https://builds.apache.org/job/helix/1527/]) [HELIX-745] Make AssignableInstanceManager listen on data changes to (narendly: rev 0af6e8c19af5ee916f93acd8582e53b776e9c712) * (edit) helix-core/src/main/java/org/apache/helix/task/AssignableInstanceManager.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java * (edit) helix-core/src/main/java/org/apache/helix/common/caches/TaskDataCache.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/ClusterDataCache.java > [TASK] Make AssignableInstanceManager listen on data changes to update > AssignableInstances > -- > > Key: HELIX-745 > URL: https://issues.apache.org/jira/browse/HELIX-745 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > Previously, although AssignableInstanceManager provided an API for updating > its AssignableInstances, this API was not being called at all. This RB fixes > this. > Changelist: > 1. Add a boolean flag in ClusterDataCache for LiveInstance, ClusterConfig, > InstanceConfig changes > 2. If the ClusterDataCache is a taskDataCache, call > AssignableInstanceManager.updateAssignableInstances() when the said boolean > flag is true > 3. Use thread-safe map in AssignableInstanceManager > 4. Address the issue of targeted tasks having null taskIds (use pName > convention instead) > 5. Address the issue of LiveInstanceChange not notifying the caches by > explicitly using setLiveInstance() function > 6. Fix bug in restoreTaskAssignResult where tasks with null quota type were > not being restored properly -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-746) [TASK] Fix removeJob so its behavior is more consistent with removeWorkflow
[ https://issues.apache.org/jira/browse/HELIX-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554970#comment-16554970 ] Hudson commented on HELIX-746: -- FAILURE: Integrated in Jenkins build helix #1527 (See [https://builds.apache.org/job/helix/1527/]) [HELIX-746] Fix removeJob so its behavior is more consistent with (narendly: rev ae23842d24409e26c67f5c99113762bd0eb714b0) * (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java > [TASK] Fix removeJob so its behavior is more consistent with removeWorkflow > --- > > Key: HELIX-746 > URL: https://issues.apache.org/jira/browse/HELIX-746 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > Change the behavior of removeJob() so that it's more consistent with > removeWorkflow(). This RB addresses the scenario: suppose config deletion > failed (so config still exists) and context deletion succeeded. Then the > Controller has no way of knowing whether this job has ever been scheduled or > it was meant to be deleted (although failed due to partial deletion). > Returning as soon as config deletion fails can prevent this scenario. > Changelist: > 1. Make removeJob() return early as soon as a ZNode write failure is detected -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-744) [TASK] Allow undefined workflow/job types to be assigned as DEFAULT type
[ https://issues.apache.org/jira/browse/HELIX-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554966#comment-16554966 ] Hudson commented on HELIX-744: -- FAILURE: Integrated in Jenkins build helix #1526 (See [https://builds.apache.org/job/helix/1526/]) [HELIX-744] Allow undefined workflow/job types to be assigned as DEFAULT (narendly: rev 6759040956907b197b6d65cb8b4759b7e8981883) * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java * (edit) helix-core/src/main/java/org/apache/helix/task/assigner/AssignableInstance.java * (edit) helix-core/src/test/java/org/apache/helix/task/assigner/TestAssignableInstance.java > [TASK] Allow undefined workflow/job types to be assigned as DEFAULT type > > > Key: HELIX-744 > URL: https://issues.apache.org/jira/browse/HELIX-744 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > Previously, we ignored undefined types, that is workflow/job types that are > not defined in ClusterConfig. This is not backward-compatible because some > users of Task Framework are setting types without any quota-related config > set in ClusterConfig. The default behavior was changed so that each > AssignableInstance will just treat these workflows/jobs as DEFAULT type, > which will make quota-based scheduling backward-compatible. Changelist: 1. > AssignableInstance treats undefined types as DEFAULT 2. Appropriate log > messages and logic change was applied to restoreTaskAssignResult logic 3. A > test case was added to TestQuotaBasedScheduling -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-743) [TASK] Fix purgeExpiredJobs() so that jobs whose removal has failed do not get removed from DAG
[ https://issues.apache.org/jira/browse/HELIX-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554826#comment-16554826 ] Hudson commented on HELIX-743: -- FAILURE: Integrated in Jenkins build helix #1525 (See [https://builds.apache.org/job/helix/1525/]) [HELIX-743] Fix purgeExpiredJobs() so that jobs whose removal has failed (narendly: rev c012c7b9dea35137935ac29c035a96aea570bf9c) * (edit) helix-core/src/main/java/org/apache/helix/task/WorkflowRebalancer.java > [TASK] Fix purgeExpiredJobs() so that jobs whose removal has failed do not > get removed from DAG > --- > > Key: HELIX-743 > URL: https://issues.apache.org/jira/browse/HELIX-743 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > Previously, even if the job removal had failed, Task Framework would go ahead > and remove the job from the DAG. This would cause some ZNodes to be left over > and never be cleaned up at next purge time. Changelist: 1. Keep track of jobs > whose removal failed and remove them from expiredJobs so that next call to > purgeExpiredJobs(), the job would be included in expiredJobs again and > removal would be tried again. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-741) Revise unreliable behavior in swapInstance
[ https://issues.apache.org/jira/browse/HELIX-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553440#comment-16553440 ] Hudson commented on HELIX-741: -- FAILURE: Integrated in Jenkins build helix #1518 (See [https://builds.apache.org/job/helix/1518/]) [HELIX-741] make swap instance more robust and idempotent (hrzhang: rev 24c52394dfff91c045367260c969f76560ebeb62) * (edit) helix-core/src/main/java/org/apache/helix/tools/ClusterSetup.java * (edit) helix-core/src/test/java/org/apache/helix/integration/TestSwapInstance.java > Revise unreliable behavior in swapInstance > -- > > Key: HELIX-741 > URL: https://issues.apache.org/jira/browse/HELIX-741 > Project: Apache Helix > Issue Type: Task >Reporter: Hao Zhang >Priority: Major > > swapInstance call did not work properly when we were trying to fix a > production issue. > > The API was old and not actively maintained. It used deprecated underlaying > data accessor API and hit a problem of partial ZK read. Thus our CLI was > unable to update all IdealStates as expected. > We have seen such problem before, especially when the cluster is bug and > there are a lot of data to read back. > This ticket is created to refactor the implementation of swapInstance() to > make it more robust, and separate ticket will be created to revise those old > API calls that are not frequently used not actively maintained. > – > AC: > - make this api call reliable and idempotent -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-738) [TASK] Remove quotaType APIs and make jobs inherit type from workflows
[ https://issues.apache.org/jira/browse/HELIX-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547378#comment-16547378 ] Hudson commented on HELIX-738: -- FAILURE: Integrated in Jenkins build helix #1516 (See [https://builds.apache.org/job/helix/1516/]) [HELIX-738] Remove quotaType APIs and make jobs inherit type from (narendly: rev 36ab2a6028dad39b32d3a15da942b4385ff9fd1d) * (edit) helix-core/src/main/java/org/apache/helix/task/ThreadCountBasedTaskAssignmentCalculator.java * (edit) helix-core/src/main/java/org/apache/helix/task/WorkflowConfig.java * (edit) helix-core/src/main/java/org/apache/helix/task/JobConfig.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskAssignmentCalculator.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestJobAndWorkflowType.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestWorkflowTermination.java * (edit) helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java * (edit) helix-core/src/main/java/org/apache/helix/task/AssignableInstanceManager.java * (edit) helix-core/src/main/java/org/apache/helix/task/FixedTargetTaskAssignmentCalculator.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java > [TASK] Remove quotaType APIs and make jobs inherit type from workflows > -- > > Key: HELIX-738 > URL: https://issues.apache.org/jira/browse/HELIX-738 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > For quota-based task scheduling, for each job, we provided get/setQuotaType > APIs. However, the use case for workflow types and job types were similar > enough that we decided to merge them and begin using workflow/job types for > quota-based scheduling. Job types will now be used as quota types, and all > jobs will inherit the type, if set, from their parent workflow, at assignment > and schedule time. > Changelist: > 1. Remove APIs around quotaType in Workflow/JobConfig > 2. Add an internal method in TaskAssignmentCalculator that includes logic for > determining which quota type each job should use > 3. Adjust tests so that they test and pass successfully -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-740) ZkHelixAdmin:NPE
[ https://issues.apache.org/jira/browse/HELIX-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547380#comment-16547380 ] Hudson commented on HELIX-740: -- FAILURE: Integrated in Jenkins build helix #1516 (See [https://builds.apache.org/job/helix/1516/]) [HELIX-740] check NPE in getInstancesInClusterWithTag and throw more (hrzhang: rev f4bb7d60782150c7d713c907211cc9d41f002c48) * (edit) helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java > ZkHelixAdmin:NPE > > > Key: HELIX-740 > URL: https://issues.apache.org/jira/browse/HELIX-740 > Project: Apache Helix > Issue Type: Task >Reporter: Hao Zhang >Priority: Major > > The NPE occurs in this line: > [https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java#L669] > Basically, we ended up in a situation where we had an instance whose config > was deleted. The line above should handle this more gracefully;we need more > meaningful error information. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-739) [TASK] Remove old comments from TestQuotaBasedScheduling
[ https://issues.apache.org/jira/browse/HELIX-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547379#comment-16547379 ] Hudson commented on HELIX-739: -- FAILURE: Integrated in Jenkins build helix #1516 (See [https://builds.apache.org/job/helix/1516/]) [HELIX-739] Remove old comments from TestQuotaBasedScheduling (narendly: rev abfb894e508938ee51c73fb8abcb02f7615386e0) * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java > [TASK] Remove old comments from TestQuotaBasedScheduling > > > Key: HELIX-739 > URL: https://issues.apache.org/jira/browse/HELIX-739 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > Remove legacy comments that are no longer true to prevent confusion in the > future. > Changelist: > 1. Remove old comments -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-735) Make AssignmentCalculators non-static so that tests pass
[ https://issues.apache.org/jira/browse/HELIX-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547171#comment-16547171 ] Hudson commented on HELIX-735: -- FAILURE: Integrated in Jenkins build helix #1515 (See [https://builds.apache.org/job/helix/1515/]) [HELIX-735] Make AssignmentCalculators non-static so that tests pass (narendly: rev cc625065bffced9a66566eeccb3055ec28a74611) * (edit) helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java * (edit) helix-core/src/test/java/org/apache/helix/task/TestAssignableInstanceManagerControllerSwitch.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestTaskThrottling.java > Make AssignmentCalculators non-static so that tests pass > > > Key: HELIX-735 > URL: https://issues.apache.org/jira/browse/HELIX-735 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > With the introduction of quota-based scheduling, every task that gets > scheduled takes up a thread. However, previously these AssignmentCalculators > (both generic and fixed for generic jobs and targeted jobs) were stateless so > they were instantiated statically. Since AssignmentCalculators now are > stateful due to them operating on AssignableInstances' quota profile, they > were made non-static so that they would be re-instantiated every pipeline. > This problem is specific to the testing environment where static variables > live on from test to test, causing AssignmentCalculators to hold on to the > very first reference to AssignableInstanceManager. Tasks were not being > assigned and scheduled because the first set of AssignableInstances would get > filled up and never get freed. > Changelist: > 1. Make AssignmentCalculators non-static > 2. Adjust sleep duration for some tests for stability -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-736) Modify TestGetLastScheduledTaskTimestamp for increased stability
[ https://issues.apache.org/jira/browse/HELIX-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547170#comment-16547170 ] Hudson commented on HELIX-736: -- FAILURE: Integrated in Jenkins build helix #1515 (See [https://builds.apache.org/job/helix/1515/]) [HELIX-736] Modify TestGetLastScheduledTaskTimestamp for increased (narendly: rev 047ad51e8b243dbbd268aa9ba954623949eba27d) * (edit) helix-core/src/test/java/org/apache/helix/task/TestGetLastScheduledTaskTimestamp.java > Modify TestGetLastScheduledTaskTimestamp for increased stability > > > Key: HELIX-736 > URL: https://issues.apache.org/jira/browse/HELIX-736 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > This test was experiencing an intermittent failure. No inherent faults of its > own, but sometimes tasks were not being given enough resource/time to be > scheduled and register the timestamp, which is expected depending on how fast > the system is running. One area of improvement was that > TestGetLastScheduledTaskTimestamp was using a long value 0 for invalid or > unscheduled timestamps. It will use -1L from now on, which is the invalid > flag TaskDriver uses. > Changlist: > 1. Change the flag for invalid or unscheduled timestamps from 0 to -1L in > TestGetLastScheduledTaskTimestamp -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-737) [ROUTER] Expose ExternalViews in RoutingTable and RoutingTableSnapshot
[ https://issues.apache.org/jira/browse/HELIX-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547172#comment-16547172 ] Hudson commented on HELIX-737: -- FAILURE: Integrated in Jenkins build helix #1515 (See [https://builds.apache.org/job/helix/1515/]) [HELIX-737] Expose ExternalViews in RoutingTable and (narendly: rev 3ba447f97a10692981ad55525fc0e1ea55baf2b9) * (edit) helix-core/src/main/java/org/apache/helix/spectator/RoutingTableProvider.java * (edit) helix-core/src/main/java/org/apache/helix/spectator/RoutingTableSnapshot.java * (edit) helix-core/src/test/java/org/apache/helix/integration/spectator/TestRoutingTableSnapshot.java * (edit) helix-core/src/main/java/org/apache/helix/spectator/RoutingTable.java > [ROUTER] Expose ExternalViews in RoutingTable and RoutingTableSnapshot > -- > > Key: HELIX-737 > URL: https://issues.apache.org/jira/browse/HELIX-737 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > This was a user request. > > Changelist: 1. Add getExternalViews() in RoutingTable 2. Add > getExternalViews() in RoutingTableProvider 3. Cache ExternalViews in > RoutingTable 4. Add an ExternalView test in TestRoutingTableSnapshot -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-734) Fix test configs for TestRecoveryLoadBalance
[ https://issues.apache.org/jira/browse/HELIX-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547168#comment-16547168 ] Hudson commented on HELIX-734: -- FAILURE: Integrated in Jenkins build helix #1515 (See [https://builds.apache.org/job/helix/1515/]) [HELIX-734] Fix test configs for TestRecoveryLoadBalance (narendly: rev 811222654a255924edef768bf5d4000c6f146f1f) * (edit) helix-core/src/test/resources/TestRecoveryLoadBalance.OnlineOffline.json * (edit) helix-core/src/test/resources/TestRecoveryLoadBalance.MasterSlave.json > Fix test configs for TestRecoveryLoadBalance > > > Key: HELIX-734 > URL: https://issues.apache.org/jira/browse/HELIX-734 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > There was a fix checked in for NumberOfErrorOrRecoveryPartitionThreshold in > IntermediateStateCalcStage from less than equal to to strictly less than. > Because of this, we need to change the config parameters (mostly from 1 to > 0). There is no other underlying logic change. > Changelist: > 1. Change config parameters appropriately in test case JSON files from 1 to 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-731) Fix TestStateTransitionThrottle and comparison operator change
[ https://issues.apache.org/jira/browse/HELIX-731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545852#comment-16545852 ] Hudson commented on HELIX-731: -- FAILURE: Integrated in Jenkins build helix #1514 (See [https://builds.apache.org/job/helix/1514/]) [HELIX-731] Fix TestStateTransitionThrottle and comparison operator (narendly: rev 343f8dd33d49a74e6def97a9e58b4adf6b007ee3) * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/IntermediateStateCalcStage.java * (edit) helix-core/src/test/java/org/apache/helix/integration/TestStateTransitionThrottle.java > Fix TestStateTransitionThrottle and comparison operator change > -- > > Key: HELIX-731 > URL: https://issues.apache.org/jira/browse/HELIX-731 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > Due to the change made in relation to allowing downward state transitions to > take place while error or recovery balance transitions are present, this test > was failing due to the change in the assumption. Parameters were adjusted, > and test conditions were modified such that it is testing the new assumptions > correctly. > Changelist: 1. TestStateTransitionThrottle assert statements were modified so > that it assumes downward load balance transitions taking place 2. Comparison > operator in IntermediateStateCalcState to make it more strictly > backward-compatible -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-733) Fix TestAssignableInstanceManagerControllerSwitch
[ https://issues.apache.org/jira/browse/HELIX-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545854#comment-16545854 ] Hudson commented on HELIX-733: -- FAILURE: Integrated in Jenkins build helix #1514 (See [https://builds.apache.org/job/helix/1514/]) [HELIX-733] Fix TestAssignableInstanceManagerControllerSwitch (narendly: rev bd171f26d164dc33ac91fc393d11de269c2665b0) * (edit) helix-core/src/test/java/org/apache/helix/task/TestAssignableInstanceManagerControllerSwitch.java > Fix TestAssignableInstanceManagerControllerSwitch > - > > Key: HELIX-733 > URL: https://issues.apache.org/jira/browse/HELIX-733 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > This test uses RoutingTableProvider. It was not being shut down, causing an > ExecutorService thread to continue executing periodic updates. This RB makes > it so that it is shut down at the end of the test. > Changelist: > 1. Add shutdown() on RoutingTableProvider at the end of the test -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-732) [TASK] Expose UserContentStore in TaskDriver
[ https://issues.apache.org/jira/browse/HELIX-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545853#comment-16545853 ] Hudson commented on HELIX-732: -- FAILURE: Integrated in Jenkins build helix #1514 (See [https://builds.apache.org/job/helix/1514/]) [HELIX-732] Expose UserContentStore in TaskDriver (narendly: rev 3ec93129e717185cd1db0fc40d09e7603d8aee5d) * (edit) helix-core/src/main/java/org/apache/helix/task/TaskDriver.java * (edit) helix-core/src/main/java/org/apache/helix/task/UserContentStore.java * (add) helix-core/src/test/java/org/apache/helix/task/TestGetUserContentStore.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java > [TASK] Expose UserContentStore in TaskDriver > > > Key: HELIX-732 > URL: https://issues.apache.org/jira/browse/HELIX-732 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > There was a user request for this feature. The intended use is to allow for > aggregation work reading from temporary data written by tasks, by allowing a > get() of UserContentStore at the TaskDriver level. UserContentStore is a > potentially useful feature that is currently under-utilized - this will > enable Gobblin and other users of Task Framework to better utilize > UserContentStore. > Changelist: > 1. Add getUserContentStore() in TaskDriver > 2. Add TestUserContentStore, an integration test for this feature > 3. Add descriptive JavaDoc warning the user that get() and put() methods for > UserContentStore is not thread-safe -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-730) [TASK] Add ThreadCountBasedAssignmentCalculator and integrate with Workflow/JobRebalancer and fix rebalancing logic
[ https://issues.apache.org/jira/browse/HELIX-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545636#comment-16545636 ] Hudson commented on HELIX-730: -- FAILURE: Integrated in Jenkins build helix #1512 (See [https://builds.apache.org/job/helix/1512/]) [HELIX-730] Add ThreadCountBasedAssignmentCalculator and integrate with (narendly: rev 4db61b56e473b64ec9956f694dd2ac6a8d328ed4) * (edit) helix-core/src/main/java/org/apache/helix/task/TaskRebalancer.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestWorkflowTimeout.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestJobTimeoutTaskNotStarted.java * (add) helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java * (add) helix-core/src/test/java/org/apache/helix/integration/task/TestTaskAssignmentCalculator.java * (edit) helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java * (edit) helix-core/src/main/java/org/apache/helix/task/JobConfig.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestJobFailureTaskNotStarted.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancer.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerRetryLimit.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestWorkflowJobDependency.java * (edit) helix-core/src/test/java/org/apache/helix/task/TestSemiAutoStateTransition.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestJobFailure.java * (edit) helix-core/src/main/java/org/apache/helix/task/assigner/ThreadCountBasedTaskAssigner.java * (edit) helix-core/src/main/java/org/apache/helix/task/FixedTargetTaskAssignmentCalculator.java * (edit) helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java * (edit) helix-core/src/test/java/org/apache/helix/integration/controller/TestTargetExternalView.java * (edit) helix-core/src/main/java/org/apache/helix/task/WorkflowRebalancer.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestIndependentTaskRebalancer.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestWorkflowTermination.java * (edit) helix-core/src/test/java/org/apache/helix/integration/TestBatchEnableInstances.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestJobTimeout.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestJobFailureHighThreshold.java * (edit) helix-core/src/main/java/org/apache/helix/task/assigner/AssignableInstance.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/ClusterDataCache.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskAssignmentCalculator.java * (edit) helix-core/src/main/java/org/apache/helix/model/ClusterConfig.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteWorkflow.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestRebalanceRunningTask.java * (edit) helix-core/src/main/java/org/apache/helix/task/AssignableInstanceManager.java * (edit) helix-core/src/test/java/org/apache/helix/integration/TestStateTransitionCancellation.java * (edit) helix-core/src/test/java/org/apache/helix/integration/controller/TestClusterMaintenanceMode.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRetryDelay.java * (edit) helix-core/src/test/java/org/apache/helix/integration/manager/TestZkHelixAdmin.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestStopWorkflow.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerFailover.java * (add) helix-core/src/main/java/org/apache/helix/task/ThreadCountBasedTaskAssignmentCalculator.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskUtil.java * (delete) helix-core/src/test/java/org/apache/helix/integration/task/TestGenericTaskAssignmentCalculator.java * (edit) helix-core/src/test/java/org/apache/helix/task/TaskSynchronizedTestBase.java > [TASK] Add ThreadCountBasedAssignmentCalculator and integrate with > Workflow/JobRebalancer and fix rebalancing logic > --- > > Key: HELIX-730 > URL: https://issues.apache.org/jira/browse/HELIX-730 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > For quota-based scheduling of tasks, we have added the TaskAssigner interface > that takes into account AssignableInstances by way of > AssignableInstanceManager. In order to use this in the currently-existing > pipeline prior to Task Framework 2.0, GenericTaskAssignmentCalculator was > replaced with ThreadCou
[jira] [Commented] (HELIX-719) [HELIX] Verify downward load balance and fix TestPartitionMovementThrottle
[ https://issues.apache.org/jira/browse/HELIX-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539260#comment-16539260 ] Hudson commented on HELIX-719: -- FAILURE: Integrated in Jenkins build helix #1498 (See [https://builds.apache.org/job/helix/1498/]) [HELIX-719] [HELIX] Verify downward load balance and fix (narendly: rev 030706c96968745c8b459f745ddc31e8458ddf99) * (edit) helix-core/src/test/java/org/apache/helix/integration/TestPartitionMovementThrottle.java > [HELIX] Verify downward load balance and fix TestPartitionMovementThrottle > -- > > Key: HELIX-719 > URL: https://issues.apache.org/jira/browse/HELIX-719 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > TestPartitionMovementThrottle was failing after the improvement was made in > IntermediateCalcStage so that downward load balance will take place while > recovery balance is happening. In the process of fixing the test, 1. It was > verified by hand that downward load balance is being correctly throttled as > defined by the user in StateTransitionThrottleConfig. 2. An appropriate > parameter adjustment was made to account for both recovery and load balance > happening in the same pipeline iteration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-723) [TEST] Remove null statement from AssignableInstanceManagerControllerSwitch
[ https://issues.apache.org/jira/browse/HELIX-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539124#comment-16539124 ] Hudson commented on HELIX-723: -- FAILURE: Integrated in Jenkins build helix #1497 (See [https://builds.apache.org/job/helix/1497/]) [HELIX-723] Remove null statement from (narendly: rev c35551c374a852d89b4ccbe5efd43cb395e33a68) * (edit) helix-core/src/test/java/org/apache/helix/task/TestAssignableInstanceManagerControllerSwitch.java > [TEST] Remove null statement from AssignableInstanceManagerControllerSwitch > --- > > Key: HELIX-723 > URL: https://issues.apache.org/jira/browse/HELIX-723 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > Controller was set to null during development of the test and never got > removed after finishing writing the test, which caused an NPE. > Changelist: > 1. Remove an old null statement in the test -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-722) Add quotaType field to WorkflowConfig and JobConfig
[ https://issues.apache.org/jira/browse/HELIX-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539123#comment-16539123 ] Hudson commented on HELIX-722: -- FAILURE: Integrated in Jenkins build helix #1497 (See [https://builds.apache.org/job/helix/1497/]) [HELIX-722] Add quotaType field to WorkflowConfig and JobConfig (narendly: rev 35fcfa0ec38e382cb7d4d981abdd4b5dcea11338) * (edit) helix-core/src/main/java/org/apache/helix/task/beans/JobBean.java * (edit) helix-core/src/main/java/org/apache/helix/task/beans/WorkflowBean.java * (edit) helix-core/src/main/java/org/apache/helix/task/WorkflowConfig.java * (edit) helix-core/src/main/java/org/apache/helix/task/JobConfig.java > Add quotaType field to WorkflowConfig and JobConfig > --- > > Key: HELIX-722 > URL: https://issues.apache.org/jira/browse/HELIX-722 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > WorkflowConfig and JobConfig define workflows and jobs respectively. In order > to support job scheduling based on quota types, we need to associate > workflows and jobs with quota types and provide APIs for get/set accordingly. > ChangeList: > 1. Workflow and Job Config have APIs added for quota type support > 2. Code formatting per Helix code formatter -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-721) Clean up code in ClusterDataCache
[ https://issues.apache.org/jira/browse/HELIX-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539122#comment-16539122 ] Hudson commented on HELIX-721: -- FAILURE: Integrated in Jenkins build helix #1497 (See [https://builds.apache.org/job/helix/1497/]) [HELIX-721] Clean up code in ClusterDataCache (narendly: rev 698532598f2cb10d0eb1c67f3961cdd6219db965) * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/ClusterDataCache.java * (edit) helix-core/src/main/java/org/apache/helix/common/caches/TaskDataCache.java > Clean up code in ClusterDataCache > - > > Key: HELIX-721 > URL: https://issues.apache.org/jira/browse/HELIX-721 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > Clean up code in ClusterDataCache -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-720) [TASK] Implement AssignableInstanceManager
[ https://issues.apache.org/jira/browse/HELIX-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537866#comment-16537866 ] Hudson commented on HELIX-720: -- FAILURE: Integrated in Jenkins build helix #1496 (See [https://builds.apache.org/job/helix/1496/]) [HELIX-720] [TASK] Implement AssignableInstanceManager (narendly: rev 034424cc4852bda55e74ddbbf42db4c7f293262c) * (add) helix-core/src/test/java/org/apache/helix/task/TestAssignableInstanceManager.java * (add) helix-core/src/main/java/org/apache/helix/task/AssignableInstanceManager.java * (add) helix-core/src/test/java/org/apache/helix/task/TestAssignableInstanceManagerControllerSwitch.java * (edit) helix-core/src/test/java/org/apache/helix/task/assigner/TestAssignableInstance.java * (edit) helix-core/src/main/java/org/apache/helix/task/assigner/AssignableInstance.java > [TASK] Implement AssignableInstanceManager > -- > > Key: HELIX-720 > URL: https://issues.apache.org/jira/browse/HELIX-720 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > AssignableInstanceManager supports job quotas in Task Framework by 1. > Re-creates AssignableInstance map with correct resource usage based on > TaskContexts 2. Provides an update API that refreshes instances and configs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-716) [HELIX] Make downward load balance also be subject to StateTransitionThrottleConfig
[ https://issues.apache.org/jira/browse/HELIX-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537795#comment-16537795 ] Hudson commented on HELIX-716: -- FAILURE: Integrated in Jenkins build helix #1494 (See [https://builds.apache.org/job/helix/1494/]) [HELIX-716] [HELIX] Make downward load balance also be subject to (narendly: rev dd3be71c9423eea8283bda6819beb76edecf1fb2) * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/IntermediateStateCalcStage.java * (edit) helix-core/src/test/java/org/apache/helix/integration/TestPartitionMovementThrottle.java > [HELIX] Make downward load balance also be subject to > StateTransitionThrottleConfig > --- > > Key: HELIX-716 > URL: https://issues.apache.org/jira/browse/HELIX-716 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > In the previous implementation of allowing downward transitions, downward > transitions were not subject to any throttling constraints. In this change, > downward load balance transitions are made subject to the throttling > constraints. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-718) Implement TaskAssignment logics
[ https://issues.apache.org/jira/browse/HELIX-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537791#comment-16537791 ] Hudson commented on HELIX-718: -- FAILURE: Integrated in Jenkins build helix #1493 (See [https://builds.apache.org/job/helix/1493/]) [HELIX-718] implement ThreadCountBasedTaskAssigner (hrzhang: rev 4c3ad2aecc07de97d5f1976a61858ddbe2f836ed) * (add) helix-core/src/test/java/org/apache/helix/task/assigner/AssignerTestBase.java * (add) helix-core/src/test/java/org/apache/helix/task/assigner/TestThreadCountBasedTaskAssigner.java * (edit) helix-core/src/test/java/org/apache/helix/task/assigner/TestAssignableInstance.java * (edit) helix-core/src/main/java/org/apache/helix/task/assigner/TaskAssignResult.java * (add) helix-core/src/main/java/org/apache/helix/task/assigner/ThreadCountBasedTaskAssigner.java > Implement TaskAssignment logics > --- > > Key: HELIX-718 > URL: https://issues.apache.org/jira/browse/HELIX-718 > Project: Apache Helix > Issue Type: Task >Reporter: Hao Zhang >Priority: Major > > Implement assignment logics: > TaskAssigner, TaskAssignResult, AssignableInstance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-718) Implement TaskAssignment logics
[ https://issues.apache.org/jira/browse/HELIX-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537786#comment-16537786 ] Hudson commented on HELIX-718: -- FAILURE: Integrated in Jenkins build helix #1492 (See [https://builds.apache.org/job/helix/1492/]) [HELIX-718] provide a method in AssignableInstance to set current (hrzhang: rev e44b29e03ef4c807e940cde717ed2f6fff58a273) * (edit) helix-core/src/main/java/org/apache/helix/task/assigner/AssignableInstance.java * (edit) helix-core/src/test/java/org/apache/helix/task/assigner/TestAssignableInstance.java > Implement TaskAssignment logics > --- > > Key: HELIX-718 > URL: https://issues.apache.org/jira/browse/HELIX-718 > Project: Apache Helix > Issue Type: Task >Reporter: Hao Zhang >Priority: Major > > Implement assignment logics: > TaskAssigner, TaskAssignResult, AssignableInstance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-718) Implement TaskAssignment logics
[ https://issues.apache.org/jira/browse/HELIX-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537764#comment-16537764 ] Hudson commented on HELIX-718: -- FAILURE: Integrated in Jenkins build helix #1491 (See [https://builds.apache.org/job/helix/1491/]) [HELIX-718] implement AssignableInstance (hrzhang: rev 2049f93abe8e56a754e4880a9157959ef24cd89e) * (edit) helix-core/src/main/java/org/apache/helix/task/assigner/TaskAssignResult.java * (edit) helix-core/src/main/java/org/apache/helix/task/assigner/TaskAssigner.java * (add) helix-core/src/test/java/org/apache/helix/task/assigner/TestAssignableInstance.java * (edit) helix-core/src/main/java/org/apache/helix/task/assigner/AssignableInstance.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskConfig.java > Implement TaskAssignment logics > --- > > Key: HELIX-718 > URL: https://issues.apache.org/jira/browse/HELIX-718 > Project: Apache Helix > Issue Type: Task >Reporter: Hao Zhang >Priority: Major > > Implement assignment logics: > TaskAssigner, TaskAssignResult, AssignableInstance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-718) Implement TaskAssignment logics
[ https://issues.apache.org/jira/browse/HELIX-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537711#comment-16537711 ] Hudson commented on HELIX-718: -- FAILURE: Integrated in Jenkins build helix #1488 (See [https://builds.apache.org/job/helix/1488/]) [HELIX-718] implement TaskAssignResult (hrzhang: rev 442cd096dd82ae0b2ee72232025b1972aced7cd9) * (edit) helix-core/src/main/java/org/apache/helix/task/assigner/TaskAssignResult.java > Implement TaskAssignment logics > --- > > Key: HELIX-718 > URL: https://issues.apache.org/jira/browse/HELIX-718 > Project: Apache Helix > Issue Type: Task >Reporter: Hao Zhang >Priority: Major > > Implement assignment logics: > TaskAssigner, TaskAssignResult, AssignableInstance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-717) Add api for get / set quota type, ratio and participant capacity
[ https://issues.apache.org/jira/browse/HELIX-717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537699#comment-16537699 ] Hudson commented on HELIX-717: -- FAILURE: Integrated in Jenkins build helix #1486 (See [https://builds.apache.org/job/helix/1486/]) [HELIX-717] Add api for get / set quota type, ratio and participant (hrzhang: rev 4c7661017e856c42b69356665b908444f589fe2c) * (edit) helix-core/src/main/java/org/apache/helix/model/ClusterConfig.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskConfig.java * (edit) helix-core/src/main/java/org/apache/helix/model/LiveInstance.java > Add api for get / set quota type, ratio and participant capacity > > > Key: HELIX-717 > URL: https://issues.apache.org/jira/browse/HELIX-717 > Project: Apache Helix > Issue Type: Task >Reporter: Hao Zhang >Priority: Major > > This is needed for supporting quota based task assignment -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-715) Add NOP classes for quota management support
[ https://issues.apache.org/jira/browse/HELIX-715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537593#comment-16537593 ] Hudson commented on HELIX-715: -- FAILURE: Integrated in Jenkins build helix #1485 (See [https://builds.apache.org/job/helix/1485/]) [HELIX-715] Add NOP classes for quota management support (narendly: rev ebbd6ba2ed57e75e5fe3506aa6fcd9f5938330fd) * (add) helix-core/src/main/java/org/apache/helix/task/assigner/AssignableInstance.java * (add) helix-core/src/main/java/org/apache/helix/task/assigner/TaskAssigner.java * (add) helix-core/src/main/java/org/apache/helix/task/assigner/TaskAssignResult.java > Add NOP classes for quota management support > > > Key: HELIX-715 > URL: https://issues.apache.org/jira/browse/HELIX-715 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > The following classes and interfaces were added: TaskAssigner, > AssignableInstance, and TaskAssignResult. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-714) [HaaS] Fix aggregate metrics in ClusterStatusMonitor
[ https://issues.apache.org/jira/browse/HELIX-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537564#comment-16537564 ] Hudson commented on HELIX-714: -- FAILURE: Integrated in Jenkins build helix #1483 (See [https://builds.apache.org/job/helix/1483/]) [HELIX-714] [HaaS] Fix aggregate metrics in ClusterStatusMonitor (narendly: rev 8a6ac8ff278aa9b4ad8445700266af820d0d62cc) * (edit) helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ResourceMonitor.java * (edit) helix-core/src/test/java/org/apache/helix/monitoring/mbeans/TestClusterAggregateMetrics.java * (edit) helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitorMBean.java * (edit) helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java > [HaaS] Fix aggregate metrics in ClusterStatusMonitor > > > Key: HELIX-714 > URL: https://issues.apache.org/jira/browse/HELIX-714 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > Names of the metrics have been fixed per Helix's convention and loops are now > used instead of using delta values. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-713) Remove unused imports in TaskAssignmentCalculator
[ https://issues.apache.org/jira/browse/HELIX-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537560#comment-16537560 ] Hudson commented on HELIX-713: -- FAILURE: Integrated in Jenkins build helix #1482 (See [https://builds.apache.org/job/helix/1482/]) [HELIX-713] Remove unused imports in TaskAssignmentCalculator (narendly: rev e1ca65193973045b2a372190740b1d9b4a78c5d8) * (edit) helix-core/src/main/java/org/apache/helix/task/GenericTaskAssignmentCalculator.java * (edit) helix-core/src/main/java/org/apache/helix/task/TaskAssignmentCalculator.java * (edit) helix-core/src/main/java/org/apache/helix/task/FixedTargetTaskAssignmentCalculator.java > Remove unused imports in TaskAssignmentCalculator > - > > Key: HELIX-713 > URL: https://issues.apache.org/jira/browse/HELIX-713 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > Remove unused imports in TaskAssignmentCalculator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-709) Prepare controller stages for async execution
[ https://issues.apache.org/jira/browse/HELIX-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537556#comment-16537556 ] Hudson commented on HELIX-709: -- FAILURE: Integrated in Jenkins build helix #1481 (See [https://builds.apache.org/job/helix/1481/]) [HELIX-709] Move external view calculation to async stage and (hrzhang: rev 542fbc840a167986a40bd57f3c5660d294acb63c) * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/ExternalViewComputeStage.java * (edit) helix-core/src/test/java/org/apache/helix/ZkUnitTestBase.java * (edit) helix-core/src/main/java/org/apache/helix/manager/zk/CallbackHandler.java * (edit) helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java * (edit) helix-core/src/main/java/org/apache/helix/controller/pipeline/AsyncWorkerType.java > Prepare controller stages for async execution > - > > Key: HELIX-709 > URL: https://issues.apache.org/jira/browse/HELIX-709 > Project: Apache Helix > Issue Type: Task >Reporter: Hao Zhang >Priority: Major > > There are a couple of stages in helix controller that can be executed > asynchronously, but each execution should be done in order. Currently for > helix controller, we have a thread pool for un-ordered execution, but we also > need one for ordered execution. > In this ticket should do the following: > 1. Create a pool of configurable workers using DedupEventProcessor > 2. Create AbstractAsyncBaseStage for those stages that can be executed > asynchronously to share common code > AC: > Create AbstractAsyncBaseStage and DedupFIFOWorkerPool for async execution, > pass all tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-711) Allow downward state transition during recovery and add recovery threshold
[ https://issues.apache.org/jira/browse/HELIX-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537413#comment-16537413 ] Hudson commented on HELIX-711: -- FAILURE: Integrated in Jenkins build helix #1480 (See [https://builds.apache.org/job/helix/1480/]) [HELIX-711] Allow downward state transition during recovery and add (narendly: rev 37f3d4c8dadd7cadeebad4fb41e2d4b1c38601fa) * (edit) helix-core/src/test/java/org/apache/helix/controller/stages/TestIntermediateStateCalcStage.java * (add) helix-core/src/test/java/org/apache/helix/controller/stages/TestRecoveryLoadBalance.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/IntermediateStateCalcStage.java * (add) helix-core/src/test/resources/TestRecoveryLoadBalance.MasterSlave.json * (add) helix-core/src/test/resources/TestRecoveryLoadBalance.OnlineOffline.json * (edit) helix-core/src/main/java/org/apache/helix/api/config/StateTransitionThrottleConfig.java * (edit) helix-core/src/main/java/org/apache/helix/model/ClusterConfig.java > Allow downward state transition during recovery and add recovery threshold > -- > > Key: HELIX-711 > URL: https://issues.apache.org/jira/browse/HELIX-711 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > # Previously, a single partition requiring recovery balance would block all > types of load-balance. This commit allows all downward state transitions > (load balance) to happen even when recovery balance is happening in the same > cycle. > # As for non-downward state transitions load-balance, a parameter, > ErrorOrRecoveryPartitionThresholdForLoadBalance, was added to ClusterConfig. > If the number of partitions requiring recovery is lower than the threshold, > non-downward load-balance will take place in the same cycle as recovery > balance; otherwise, non-downward load-balance will not take place in the same > cycle. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-709) Prepare controller stages for async execution
[ https://issues.apache.org/jira/browse/HELIX-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526974#comment-16526974 ] Hudson commented on HELIX-709: -- FAILURE: Integrated in Jenkins build helix #1478 (See [https://builds.apache.org/job/helix/1478/]) [HELIX-709] Prepare controller stages for async execution (hrzhang: rev d22adbf9760316118dd8e6eda5aba4219e399a60) * (edit) helix-core/src/main/java/org/apache/helix/controller/pipeline/AbstractBaseStage.java * (delete) helix-core/src/main/java/org/apache/helix/controller/stages/AsyncWorkerType.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/PersistAssignmentStage.java * (add) helix-core/src/main/java/org/apache/helix/controller/pipeline/AsyncWorkerType.java * (edit) helix-core/src/main/java/org/apache/helix/controller/pipeline/Pipeline.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/TargetExteralViewCalcStage.java * (add) helix-core/src/main/java/org/apache/helix/controller/pipeline/AbstractAsyncBaseStage.java * (edit) helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java * (edit) helix-core/src/test/java/org/apache/helix/integration/common/ZkIntegrationTestBase.java > Prepare controller stages for async execution > - > > Key: HELIX-709 > URL: https://issues.apache.org/jira/browse/HELIX-709 > Project: Apache Helix > Issue Type: Task >Reporter: Hao Zhang >Priority: Major > > There are a couple of stages in helix controller that can be executed > asynchronously, but each execution should be done in order. Currently for > helix controller, we have a thread pool for un-ordered execution, but we also > need one for ordered execution. > In this ticket should do the following: > 1. Create a pool of configurable workers using DedupEventProcessor > 2. Create AbstractAsyncBaseStage for those stages that can be executed > asynchronously to share common code > AC: > Create AbstractAsyncBaseStage and DedupFIFOWorkerPool for async execution, > pass all tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-710) Create abstract state model for distributed leader standby helix service
[ https://issues.apache.org/jira/browse/HELIX-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526858#comment-16526858 ] Hudson commented on HELIX-710: -- FAILURE: Integrated in Jenkins build helix #1477 (See [https://builds.apache.org/job/helix/1477/]) [HELIX-710] Create abstract state model for distributed leader standby (hrzhang: rev 4a99bc43c6f22e478a49fb7f2bbac42d608f17b5) * (edit) helix-core/src/main/java/org/apache/helix/participant/DistClusterControllerStateModel.java * (add) helix-core/src/main/java/org/apache/helix/participant/AbstractHelixLeaderStandbyStateModel.java > Create abstract state model for distributed leader standby helix service > > > Key: HELIX-710 > URL: https://issues.apache.org/jira/browse/HELIX-710 > Project: Apache Helix > Issue Type: Task >Reporter: Hao Zhang >Priority: Major > > In order to implement state model def for other helix services, I'd prefer to > abstract an interface that helix service would use, to avoid duplicated code. > AC: > - implement AbstractHelixLeaderStandbyStateModel and implement cluster > controller state model with it. The abstract model can also be used by other > helix services -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-706) ExternalViewGeneration should be executed asynchronously in Helix controller
[ https://issues.apache.org/jira/browse/HELIX-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526795#comment-16526795 ] Hudson commented on HELIX-706: -- FAILURE: Integrated in Jenkins build helix #1476 (See [https://builds.apache.org/job/helix/1476/]) [HELIX-706] process tev and persist assignment asynchronously (hrzhang: rev 7a2b9693d49d578ac9121a944f1b469c2f2316d9) * (edit) helix-core/src/main/java/org/apache/helix/common/DedupEventProcessor.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/ClusterDataCache.java * (edit) helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/PersistAssignmentStage.java * (edit) helix-core/src/main/java/org/apache/helix/controller/pipeline/AbstractBaseStage.java * (add) helix-core/src/main/java/org/apache/helix/controller/stages/AsyncWorkerType.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/TargetExteralViewCalcStage.java > ExternalViewGeneration should be executed asynchronously in Helix controller > > > Key: HELIX-706 > URL: https://issues.apache.org/jira/browse/HELIX-706 > Project: Apache Helix > Issue Type: Task >Reporter: Hao Zhang >Priority: Major > > EV generation should not block helix resource rebalance. According to our > profiling results, external view generation takes ~ 1/5 of the pipeline > latency. > The goal is to generate external view asynchronously, and hopefully we can > have 20% improvement in rebalance pipeline -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-705) Participant duplicated state transition handling rework
[ https://issues.apache.org/jira/browse/HELIX-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524291#comment-16524291 ] Hudson commented on HELIX-705: -- FAILURE: Integrated in Jenkins build helix #1471 (See [https://builds.apache.org/job/helix/1471/]) [HELIX-705]: Participant duplicated state transition handling rework (hrzhang: rev 8dc19afb9b70d262da0eb2081840d65f2a031122) * (edit) helix-core/src/main/java/org/apache/helix/messaging/handling/HelixStateTransitionHandler.java * (edit) helix-core/src/test/java/org/apache/helix/messaging/handling/TestHelixTaskExecutor.java * (edit) helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java * (edit) helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTask.java > Participant duplicated state transition handling rework > --- > > Key: HELIX-705 > URL: https://issues.apache.org/jira/browse/HELIX-705 > Project: Apache Helix > Issue Type: Task >Reporter: Hao Zhang >Priority: Major > > Helix should have some re-work on participant side message handling: > - Duplicated message in same batch: discard the later one > - Duplicated message in different batches, the later one should be discarded > if the first one is in progress > - During state transition, we should not rely on current state delta to get > partition's current state, but should lock on state model def (thread safety) > - Duplicated state transition (toState == currentState) should not result in > error, which is confusion, but should report success -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-703) Change print statement to log statement
[ https://issues.apache.org/jira/browse/HELIX-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524268#comment-16524268 ] Hudson commented on HELIX-703: -- FAILURE: Integrated in Jenkins build helix #1469 (See [https://builds.apache.org/job/helix/1469/]) [HELIX-703] Change print statement to log statement (narendly: rev 0d77cbafc6534dd7b0e9867b1dbf8a2266fd2281) * (edit) helix-core/src/main/java/org/apache/helix/ConfigAccessor.java * (edit) helix-core/src/main/java/org/apache/helix/manager/zk/ZKUtil.java > Change print statement to log statement > --- > > Key: HELIX-703 > URL: https://issues.apache.org/jira/browse/HELIX-703 > Project: Apache Helix > Issue Type: Improvement > Components: helix-core >Reporter: Hunter L >Priority: Major > > Change print statement to log statement -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-704) Refactor IntermediateState and throttling code
[ https://issues.apache.org/jira/browse/HELIX-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524269#comment-16524269 ] Hudson commented on HELIX-704: -- FAILURE: Integrated in Jenkins build helix #1469 (See [https://builds.apache.org/job/helix/1469/]) [HELIX-704] Refactor IntermediateState and throttling code (narendly: rev 323fbd049ed84a17ca8a2a00019bf76d51f5346e) * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/IntermediateStateCalcStage.java * (edit) helix-core/src/main/java/org/apache/helix/api/config/StateTransitionThrottleConfig.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/StateTransitionThrottleController.java > Refactor IntermediateState and throttling code > -- > > Key: HELIX-704 > URL: https://issues.apache.org/jira/browse/HELIX-704 > Project: Apache Helix > Issue Type: Improvement > Components: helix-core >Reporter: Hunter L >Priority: Major > > Refactor IntermediateState and throttling code with comments. > Changelist: > 1. Add clear comments/JavaDoc/log messages > 2. Rename confusing variable names and typos > 3. A few small micro-optimizations > There is no change in the code logic -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-681) Participant should not fail state transition on fail to delete / relay message
[ https://issues.apache.org/jira/browse/HELIX-681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451480#comment-16451480 ] Hudson commented on HELIX-681: -- FAILURE: Integrated in Jenkins build helix #1461 (See [https://builds.apache.org/job/helix/1461/]) [HELIX-681] change controller msg purge timeout to larger number (zhan849: rev ba86a3f554d600a2f781f0975335f3bad431a3ba) * (edit) helix-core/src/test/java/org/apache/helix/ZkUnitTestBase.java * (edit) helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java * (edit) helix-core/src/main/java/org/apache/helix/SystemPropertyKeys.java * (edit) helix-core/src/test/java/org/apache/helix/controller/stages/TestRebalancePipeline.java > Participant should not fail state transition on fail to delete / relay message > -- > > Key: HELIX-681 > URL: https://issues.apache.org/jira/browse/HELIX-681 > Project: Apache Helix > Issue Type: Bug >Reporter: Hao Zhang >Priority: Major > > Currently we have a general try-catch block in HelixTask and > HelixTaskExecutor, which, upon any exception thrown from state transition > routine, will fail state transition. However there are at least the following > cases in which state transition should be considered as successful: > * When we fail to delete message after successfully handled message and > updated current state -> this is because we already completed state > transition and current state is consistent between participant and ZK > * When we fail to send out relay message > as relay message provides only > best effort of delivering messages, which has nothing to do with state > transition's results. In case of fail to relay message, controller will > resend message which ensures correctness. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-682) Stale message should not prevent controller from rebalancing resource
[ https://issues.apache.org/jira/browse/HELIX-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451358#comment-16451358 ] Hudson commented on HELIX-682: -- FAILURE: Integrated in Jenkins build helix #1451 (See [https://builds.apache.org/job/helix/1451/]) [HELIX-682] delete duplicated message and log error in HelixTaskExecutor (zhan849: rev 5f9fadc72bc1916f008792707db848ee51bbd997) * (edit) helix-core/src/test/java/org/apache/helix/messaging/handling/TestHelixTaskExecutor.java * (edit) helix-core/src/test/java/org/apache/helix/MockAccessor.java * (edit) helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java > Stale message should not prevent controller from rebalancing resource > - > > Key: HELIX-682 > URL: https://issues.apache.org/jira/browse/HELIX-682 > Project: Apache Helix > Issue Type: Bug >Reporter: Hao Zhang >Priority: Major > > Currently during MessageGenerationPhase, we skip re-balancing when there is > pending message. Though we assume that participant will delete messages when > they finish the task, there will be cases that when ZK is not stable and > participant fail to do so, which will leave message un-deleted and thus block > rebalance. > Ideally on controller side, we should try to delete message as well: if > partition's current state is same as message's toState, or there is totally > invalid message remaining, controller should try to delete message to unblock > rebalancing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-674) Constraint Based Resource Rebalancer
[ https://issues.apache.org/jira/browse/HELIX-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449080#comment-16449080 ] Hudson commented on HELIX-674: -- FAILURE: Integrated in Jenkins build helix #1448 (See [https://builds.apache.org/job/helix/1448/]) [HELIX-674] Introducing constraints based rebalancing mechanism. (ericwang1985: rev 3d2d57b05af443dcc8658f8b7bfc1fc0d18cd196) * (add) helix-core/src/test/java/org/apache/helix/controller/rebalancer/constraint/dataprovider/MockPartitionWeightProvider.java * (add) helix-core/src/main/java/org/apache/helix/controller/rebalancer/strategy/ConstraintRebalanceStrategy.java * (add) helix-core/src/test/java/org/apache/helix/controller/rebalancer/TestConstraintRebalanceStrategy.java * (add) helix-core/src/main/java/org/apache/helix/api/rebalancer/constraint/dataprovider/CapacityProvider.java * (edit) helix-core/src/main/java/org/apache/helix/controller/rebalancer/strategy/crushMapping/CardDealingAdjustmentAlgorithm.java * (add) helix-core/src/main/java/org/apache/helix/controller/rebalancer/constraint/dataprovider/ZkBasedCapacityProvider.java * (add) helix-core/src/main/java/org/apache/helix/api/rebalancer/constraint/dataprovider/PartitionWeightProvider.java * (add) helix-core/src/test/java/org/apache/helix/integration/TestWeightBasedRebalanceUtil.java * (edit) helix-core/src/main/java/org/apache/helix/controller/rebalancer/strategy/AbstractEvenDistributionRebalanceStrategy.java * (add) helix-core/src/main/java/org/apache/helix/examples/WeightAwareRebalanceUtilExample.java * (add) helix-core/src/main/java/org/apache/helix/controller/rebalancer/constraint/dataprovider/ZkBasedPartitionWeightProvider.java * (add) helix-core/src/main/java/org/apache/helix/api/rebalancer/constraint/AbstractRebalanceHardConstraint.java * (add) helix-core/src/main/java/org/apache/helix/controller/rebalancer/constraint/TotalCapacityConstraint.java * (add) helix-core/src/main/java/org/apache/helix/util/WeightAwareRebalanceUtil.java * (add) helix-core/src/test/java/org/apache/helix/controller/rebalancer/constraint/dataprovider/MockCapacityProvider.java * (add) helix-core/src/main/java/org/apache/helix/api/rebalancer/constraint/AbstractRebalanceSoftConstraint.java * (add) helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/ResourceUsageCalculator.java * (add) helix-core/src/main/java/org/apache/helix/controller/rebalancer/constraint/PartitionWeightAwareEvennessConstraint.java > Constraint Based Resource Rebalancer > > > Key: HELIX-674 > URL: https://issues.apache.org/jira/browse/HELIX-674 > Project: Apache Helix > Issue Type: New Feature >Reporter: Jiajun Wang >Assignee: Jiajun Wang >Priority: Major > Fix For: 0.8.x > > Attachments: Constraint-BasedResourceRebalancing-080318-2226-240.pdf > > > Helix rebalancer assigns resources according to different strategies. > Recently, we optimize the strategy for evenness and minimize movement. > However, the evenness here only applies to partition numbers. Moreover, we've > got more requests for customizable rebalancer from our users. > Take partition weight as an example: > In reality, partition replicas have different size. We use "partition weight" > as an abstraction of the partition size. It can be network traffic usage, > disk usage, or any other combined factors. > Given each partition may have different weights, Helix should be able to > assign partition accordingly. So that the distribution would be even > regarding the weight. > In this project, we are planning new rebalancer mechanism that generates > resource partition assignment according to a list of "constraints". Current > rebalance strategy can be regarded as one kind of constraint. Moving forward, > Helix users would be able to extend the constraint interface using their own > logic. > Some init discussions are in progress and we will have a proposal posted here > soon. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-696) Workflow state messed up after timeout, and is not cleaned
[ https://issues.apache.org/jira/browse/HELIX-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446033#comment-16446033 ] Hudson commented on HELIX-696: -- FAILURE: Integrated in Jenkins build helix #1447 (See [https://builds.apache.org/job/helix/1447/]) [HELIX-696] fix workflow state flip-flop issue (zhan849: rev 317c300c8f951c7e8308cb0b24e48f97a1ef32ef) * (edit) helix-core/src/main/java/org/apache/helix/task/TaskRebalancer.java * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteWorkflow.java * (edit) helix-core/src/main/java/org/apache/helix/task/WorkflowRebalancer.java * (edit) helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java * (edit) helix-core/src/test/java/org/apache/helix/task/TestJobStateOnCreation.java * (add) helix-core/src/test/java/org/apache/helix/integration/task/TestWorkflowTermination.java > Workflow state messed up after timeout, and is not cleaned > -- > > Key: HELIX-696 > URL: https://issues.apache.org/jira/browse/HELIX-696 > Project: Apache Helix > Issue Type: Bug >Reporter: Hao Zhang >Priority: Major > > Couple of problems with current workflow finish handling logic: > # After timeout, timer is not scheduled to clean it up when workflow expires > # After timeout, state handling logic is messy that previously stopped > workflow states flip-flop between TIMED_OUT and STOPPED > # MBean is not updated correctly as we update latency before setting finish > time -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-697) Add cluster level metrics in ClusterStatusMonitor
[ https://issues.apache.org/jira/browse/HELIX-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444850#comment-16444850 ] Hudson commented on HELIX-697: -- FAILURE: Integrated in Jenkins build helix #1443 (See [https://builds.apache.org/job/helix/1443/]) [HELIX-697] Add cluster level metrics in ClusterStatusMonitor (narendly: rev e1faf2404c3bb74aab7c402d76246b41af74fd16) * (edit) helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitor.java * (add) helix-core/src/test/java/org/apache/helix/monitoring/mbeans/TestClusterAggregateMetrics.java * (edit) helix-core/src/main/java/org/apache/helix/monitoring/mbeans/dynamicMBeans/DynamicMBeanProvider.java * (edit) helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ClusterStatusMonitorMBean.java * (edit) helix-core/src/main/java/org/apache/helix/monitoring/mbeans/ResourceMonitor.java > Add cluster level metrics in ClusterStatusMonitor > - > > Key: HELIX-697 > URL: https://issues.apache.org/jira/browse/HELIX-697 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > Add cluster level metrics in ClusterStatusMonitor -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-699) Compare InstanceConfigs using their IDs in RoutingTable
[ https://issues.apache.org/jira/browse/HELIX-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444852#comment-16444852 ] Hudson commented on HELIX-699: -- FAILURE: Integrated in Jenkins build helix #1443 (See [https://builds.apache.org/job/helix/1443/]) [HELIX-699] Compare InstanceConfigs using their IDs in RoutingTable (narendly: rev 90ef589aa47ef1726356ce5ea37e12d27372b342) * (edit) helix-core/src/main/java/org/apache/helix/spectator/RoutingTable.java > Compare InstanceConfigs using their IDs in RoutingTable > --- > > Key: HELIX-699 > URL: https://issues.apache.org/jira/browse/HELIX-699 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > A possible race condition was causing a NPE on InstanceConfig.getHostName(). > Instead of comparing hostnames and ports, we compare IDs, which are supposed > to be concatenation of instance name, hostname, and port anyways and should > always be set. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-698) Add periodic refresh to RoutingTableProvider
[ https://issues.apache.org/jira/browse/HELIX-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444851#comment-16444851 ] Hudson commented on HELIX-698: -- FAILURE: Integrated in Jenkins build helix #1443 (See [https://builds.apache.org/job/helix/1443/]) [HELIX-698] Add periodic refresh to RoutingTableProvider (narendly: rev 0e4163f18c1274c0f77320698e9dfbf42314810d) * (edit) helix-core/src/main/java/org/apache/helix/spectator/RoutingTableProvider.java * (edit) helix-core/src/main/java/org/apache/helix/NotificationContext.java * (add) helix-core/src/test/java/org/apache/helix/integration/spectator/TestRoutingTableProviderPeriodicRefresh.java * (edit) helix-core/src/main/java/org/apache/helix/common/caches/BasicClusterDataCache.java > Add periodic refresh to RoutingTableProvider > - > > Key: HELIX-698 > URL: https://issues.apache.org/jira/browse/HELIX-698 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Priority: Major > > There have been incidents where RoutingTableProvider was not getting a proper > refresh potentially due to the lag in ZKClient CallbackHandler or > connectivity issues. This addition of periodic refresh avoids cases where > RoutingTableProvider is severely delayed by initiating periodic refreshes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-695) Add Helix Manager listener for new connection notification
[ https://issues.apache.org/jira/browse/HELIX-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444555#comment-16444555 ] Hudson commented on HELIX-695: -- FAILURE: Integrated in Jenkins build helix #1441 (See [https://builds.apache.org/job/helix/1441/]) [HELIX-695] add helix manager listener for new connection notification (zhan849: rev ae8eb5969e8bf5cc72704e0ef316cd4259f3b461) * (edit) helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixManager.java * (edit) helix-core/src/test/java/org/apache/helix/integration/TestZkReconnect.java > Add Helix Manager listener for new connection notification > -- > > Key: HELIX-695 > URL: https://issues.apache.org/jira/browse/HELIX-695 > Project: Apache Helix > Issue Type: Task >Reporter: Hao Zhang >Priority: Major > > Currently HelixManager is not notifying state listener about connection > establishment. Adding this notification is useful since HelixManager supports > get ZkClient method and when connection is re-established, ZkClient is newly > created and users who used get method to extract client should be notified > and refresh their client. -- This message was sent by Atlassian JIRA (v7.6.3#76005)