[GitHub] helix pull request #296: Skip resources with state model def ref as Task dur...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/296 Skip resources with state model def ref as Task during top state handoff We should not report top state handoff for resources with state model def ref as "Task" as this is meaningless and creates too many mbeans in task-intense environments. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/bug-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/296.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #296 commit 54158099abe38e18229ae74e4707eb4c822405ec Author: Harry Zhang Date: 2018-11-14T22:42:04Z Skip resources with state model def ref as Task ---
[GitHub] helix pull request #294: Implement view cluster aggregator
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/294 Implement view cluster aggregator Based on #266 , design of helix view aggregator, here is the implementation. Helix view aggregator will be a different module under helix repo, and the impl will use components from helix-core. Here is a debrief of this PR: - View cluster related information will be added to `ClusterConfig`, and related java apis will be added to helix-core. - `SourceClusterDataProvider` is a component that watches changes from source cluster, updates its cached data, and notify data change event via a given channel - `ViewClusterRefresher` is a component that does the actual refresh operation of the view cluster. It computes diff between source clusters and view cluster, make changes to view cluster accordingly - `SourceClusterConfigChangeAction` is a wrapper containing information about what to update for a source cluster. It takes in old and new `ClusterConfig` and compute actions to adopt view cluster to the changes - `HelixViewAggregator` hooks up small components and contains the main reconciliation loop for refreshing view cluster - Metrics recording mechanism is added - starting helix view aggregator in a stand-alone mode and distributed mode (by adding state model into helix participant) is supported - related unit tests and integration tests are added You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/view-aggregator-impl Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/294.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #294 commit d6b98f236d85522866e4e7584561a02ce525d142 Author: Harry Zhang Date: 2018-11-01T22:26:45Z [HELIX-781]: add java apis for view cluster config support commit 759cf8a9a48805851c50681afc98b2cafe309ff7 Author: Harry Zhang Date: 2018-11-02T21:28:23Z [HELIX-781] set up module structure for helix view aggregator commit 2ae85f6127b43aa7928776ede7e9bbbfe94e9385 Author: Harry Zhang Date: 2018-11-02T21:38:14Z [HELIX-781] implement SourceClusterDataProvider and add tests commit 1df6c7e15bc31fb424883e3febcd9f8cd019de82 Author: Harry Zhang Date: 2018-11-02T21:42:56Z [HELIX-781] implement ViewClusterRefresher and add tests commit 7802cf2443d45092c69cb529b4bba17a3c45dc31 Author: Harry Zhang Date: 2018-11-02T21:50:35Z [HELIX-781] implement SourceClusterConfigChangeAction and added tests commit 985a4ac2bcc43738d9c7ab463ad606f8691f7298 Author: Harry Zhang Date: 2018-11-02T21:59:21Z [HELIX-781] implement helix view aggregator main logic and added tests commit 3e2cfb695720fc09f181b55a3a31adc2b44f94a4 Author: Harry Zhang Date: 2018-11-02T22:08:43Z [HELIX-781] added metrics to helix view aggregator and added tests commit 82d33e0aad1ad2279f864fea6d5851129ba56610 Author: Harry Zhang Date: 2018-11-02T22:11:00Z [HELIX-781] added main function to start helix view aggregator via bash commit 11a7c126ca8bb9298754cb9127af588148af0902 Author: Harry Zhang Date: 2018-11-02T22:12:57Z [HELIX-781] support deploy helix view aggregator in a distributed fashion using helix participant ---
[GitHub] helix pull request #292: [HELIX-785] Record helix latency instead of user la...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/292 [HELIX-785] Record helix latency instead of user latency in top state handoff metrics - top state handoff reports helix latency instead of user latency - modified test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/top-state-handoff-metrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/292.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #292 commit 37a58cfff91fb5f6608a4a06d1922bb5a5eb9ca1 Author: Harry Zhang Date: 2018-11-02T18:30:15Z [HELIX-785] Record helix latency instead of user latency in top state handoff metrics ---
[GitHub] helix pull request #291: Fix unstable TestControllerLeadershipChange
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/291 Fix unstable TestControllerLeadershipChange - make setLeader more reliable - restart participant after manager 1 regain leadership - use cluster verifier to wait for cluster converge You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/test-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/291.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #291 commit 43bbc6454ab785a998936b5638c012c1a0076969 Author: Harry Zhang Date: 2018-11-02T00:50:09Z Fix unstable TestControllerLeadershipChange ---
[GitHub] helix pull request #290: fix potential NPE in TopStateHandoffReportStage
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/290 fix potential NPE in TopStateHandoffReportStage You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/minor-fixes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/290.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #290 commit aa1c9ff50975b53d80ba67cd8bdeb51fe782d73d Author: Harry Zhang Date: 2018-11-02T00:48:05Z fix potential NPE in TopStateHandoffReportStage ---
[GitHub] helix pull request #289: [HELIX-780] add task user content related api and a...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/289 [HELIX-780] add task user content related api and added more tests - added get/add task user content rest api - consolidated rest api behavior: when getting/adding user content, if job/workflow does not exist, throw 404 - added more test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/tf-rest-api Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/289.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #289 commit 18aa67b6d5c703e5b938b2f915f52a6ca856e889 Author: Harry Zhang Date: 2018-10-09T21:31:00Z [HELIX-780] add task user content related api and added more tests ---
[GitHub] helix pull request #287: [HELIX-780] add get/add job user content rest api
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/287 [HELIX-780] add get/add job user content rest api added apis and tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/tf-rest-api Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/287.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #287 commit a09a18ac55464c3e399800b4474ccb6e64d168ec Author: Harry Zhang Date: 2018-10-08T22:36:53Z [HELIX-780] add get/add job user content rest api ---
[GitHub] helix pull request #285: [HELIX-779] do not clean list field in maintenance ...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/285 [HELIX-779] do not clean list field in maintenance rebalancer for new resources Setting list fields to empty map will prevent newly added and initially rebalanced resources during maintenance mode from getting re-balanced after cluster exists maintenance mode. The right thing to do is to clear every preference list. Also added test case to verify You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/maintenance-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/285.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #285 commit bfaa8399529b6e63b307c1fbe60903c3ca08fbb1 Author: Harry Zhang Date: 2018-10-04T22:50:16Z [HELIX-779] do not clean list field in maintenance rebalancer for new resources ---
[GitHub] helix pull request #283: [HELIX-775] consolidate user content related apis f...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/283 [HELIX-775] consolidate user content related apis for task driver HELIX-1315: consolidate user content related apis for task driver To consolidate task driver user content related apis, and corresponding rest apis, I'm deprecating the general getUserContent() api, but instead, we now have the following apis for get / add / update user content. ```java public void addOrUpdateWorkflowUserContentMap(String workflowName, final Map contentToAddOrUpdate); public void addOrUpdateJobUserContentMap(String workflowName, String jobName, final Map contentToAddOrUpdate); public void addOrUpdateTaskUserContentMap(String workflowName, String jobName, String taskPartitionId, final Map contentToAddOrUpdate); public Map getWorkflowUserContentMap(String workflowName); public Map getJobUserContentMap(String workflowName, String jobName); public Map getTaskUserContentMap(String workflowName, String jobName, String taskPartitionId); ``` delete user content api tbd but can use the same convension You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/task-user-content Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/283.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #283 commit b235c4ee5a82c5970d29e839317ea242813a58bc Author: Harry Zhang Date: 2018-10-04T18:25:08Z [HELIX-775] consolidate user content related apis for task driver ---
[GitHub] helix pull request #282: [HELIX-775] add task driver support for helix rest ...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/282 [HELIX-775] add task driver support for helix rest to add/get task fr⦠â¦amework user content consolidate user content related apis for task driver To consolidate task driver user content related apis, and corresponding rest apis, I'm deprecating the general getUserContent() api, but instead, we now have the following apis for get / add / update user content. ```java public void addOrUpdateWorkflowUserContentMap(String workflowName, final Map contentToAddOrUpdate); public void addOrUpdateJobUserContentMap(String workflowName, String jobName, final Map contentToAddOrUpdate); public void addOrUpdateTaskUserContentMap(String workflowName, String jobName, String taskPartitionId, final Map contentToAddOrUpdate); public Map getWorkflowUserContentMap(String workflowName); public Map getJobUserContentMap(String workflowName, String jobName); public Map getTaskUserContentMap(String workflowName, String jobName, String taskPartitionId); ``` API for deleting user content is TBD but can use the same convension You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/task-user-content Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/282.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #282 commit 7ec5313bccb679014d6a0605ee5d7184063e555e Author: Harry Zhang Date: 2018-10-31T20:55:44Z [HELIX-775] add task driver support for helix rest to add/get task framework user content ---
[GitHub] helix pull request #281: [HELIX-773] add getLastScheduledTaskTimestamp infor...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/281 [HELIX-773] add getLastScheduledTaskTimestamp information in workflow rest API - Added TaskExecutionInfo object to wrap task execution information - added TaskExecutionInfo to last scheduled task in workflow property in workflow rest API - Modified related tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/workflow-rest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/281.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #281 commit 917f6b7ee1b2b44b10eea7e5de7f07aa7f184618 Author: Harry Zhang Date: 2018-10-30T23:43:25Z [HELIX-773] add getLastScheduledTaskTimestamp information in workflow rest api ---
[GitHub] helix pull request #280: [HELIX-772] add TaskDriver.addUserContent() api and...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/280 [HELIX-772] add TaskDriver.addUserContent() api and related tests Implemented TaskDriver.addUserContent() Added test (TestGetSetUserContentStore) for testing all getter/setter for user content Modified unstable TestIndependentTaskRebalancer You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/add-user-content Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/280.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #280 commit df24f5975bd517626490f14e6e038f8370ddd815 Author: Harry Zhang Date: 2018-10-30T23:25:12Z [HELIX-772] add TaskDriver.addUserContent() api and related tests ---
[GitHub] helix pull request #278: [HELIX-771] More detailed top state handoff metrics
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/278 [HELIX-771] More detailed top state handoff metrics Added more details about top state handoff to distinguish helix latency and user latency We define there are 2 types of handoff - Graceful handoff (controlled top state handoff, i.e. disable instance, load balance, etc) - Non-Graceful (uncontroller top state handoff, i.e. node crash, etc) For graceful handoff, we record total handoff latency and user latency For non-graceful handoff, we record total handoff only Moved top state handoff metrics to an independent stage to make logics cleaner.\ Refactored TestTopStateHandoffmetrics to make it cleaner and more json more natively You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/topstate-metrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/278.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #278 commit 7e49f995e29ea200fcc42ce6af148ed521979f5c Author: Harry Zhang Date: 2018-10-30T22:55:20Z [HELIX-771] More detailed top state handoff metrics ---
[GitHub] helix pull request #266: Propose design for aggregated cluster view service
Github user zhan849 commented on a diff in the pull request: https://github.com/apache/helix/pull/266#discussion_r227597163 --- Diff: designs/aggregated-cluster-view/design.md --- @@ -0,0 +1,353 @@ +Aggregated Cluster View Design +== + +## Introduction +Currently Helix organize information by cluster - clusters are autonomous entities that holds resource / node information. +In real practice, a helix client might need to access aggregated information of helix clusters from different data center regions for management or coordination purpose. +This design proposes a service in Helix ecosystem for clients to retrieve cross-datacenter information in a more efficient way. + + +## Problem Statement +We identified a couple of use cases for accessing cross datacenter information. [Ambry](https://github.com/linkedin/ambry) is one of them. --- End diff -- Sure (will also update design doc about it). Ambry uses Helix spectator in both their router (for retrying get requests remotely if failed locally) and storage node (for data replication purpose). Given the amount of clients that need global information, it would be more cost-effective for them if aggregated information are provided locally. ---
[GitHub] helix pull request #266: Propose design for aggregated cluster view service
Github user zhan849 commented on a diff in the pull request: https://github.com/apache/helix/pull/266#discussion_r227562948 --- Diff: designs/aggregated-cluster-view/design.md --- @@ -0,0 +1,353 @@ +Aggregated Cluster View Design +== + +## Introduction +Currently Helix organize information by cluster - clusters are autonomous entities that holds resource / node information. +In real practice, a helix client might need to access aggregated information of helix clusters from different data center regions for management or coordination purpose. +This design proposes a service in Helix ecosystem for clients to retrieve cross-datacenter information in a more efficient way. + + +## Problem Statement +We identified a couple of use cases for accessing cross datacenter information. [Ambry](https://github.com/linkedin/ambry) is one of them. +Here is a simplified example: some service has Helix cluster "MyDBCluster" in 3 data centers respectively, and each cluster has a resource named "MyDB". +To federate this "MyDBCluster", current usage is to have each federation client (usually Helix spectator) to connect to metadata store endpoints in all fabrics to retrieve information and aggregate them locally. +Such usge has the following drawbacks: + +* As there are a lot of clients in each DC that need cross-dc information, there are a lot of expensive cross-dc traffics +* Every client needs to know information about metadata stores in all fabrics which + * Increases operational cost when these information changes + * Increases security concern by allowing cross data center traffic + +To solve the problem, we have the following requirements: +* Clients should still be able to GET/WATCH aggregated information from 1 or more metadata stores (likely but not necessarily from different data centers) +* Cross DC traffic should be minimized +* Reduce amount of information about data center that a client needs +* Agility of information aggregation can be configured +* Currently, it's good enough to have only LiveInstance, InstanceConfig, and ExternalView aggregated + + + + + +## Proposed Design + +To provide aggregated cluster view, the solution I'm proposing is to add a special type of cluster, i.e. **View Cluster**. +View cluster leverages current Helix semantics to store aggregated information of various **Source Clusters**. +There will be another micro service (Helix View Aggregator) running, fetching information from clusters (likely from other data centers) to be aggregated, and store then to the view cluster. --- End diff -- though setting up observer local to clients can potentially reduce cross data center traffic, but has a few draw backs: 1. all data changes will be propagated immediately, and if such information is not required frequently, there will be wasted traffic. Building a service makes it possible to customize aggregation granularity 2. Using zookeeper observer leaves aggregation logic to client - providing aggregated data will make it easier for user to consume 3. Building a service will leave space to customize aggregated data in the future, i.e. if we want to aggregate idea state, we might not need to aggregate preference list, etc Will add these points into design doc ---
[GitHub] helix pull request #270: [HELIX-753] Record top state handoff finished in si...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/270 [HELIX-753] Record top state handoff finished in single cluster data cache refresh This PR adds top state handoff reporting when a single pipeline refresh catches the entire handoff process, which we missed before. Here is the rough procedure: - retrieve cached last top state instance for a partition - retrieve current top state instance for a partition - if there is no missing top state record of that partition, and top state instance changed, we record the number Current top state end time is easy to find from current state in cluster data cache, for handoff start time, if we cannot find it, we use last pipeline run's end time for best guess. Detailed reason is explained in code comment. Added test case to verify such top state handoff, and consolidated common part in TestTopStateHandoffMetrics for avoiding code replication You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/topstate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/270.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #270 commit d501e8fa30596d9cd98078f0d1ce7c1ecf20c595 Author: Harry Zhang Date: 2018-09-21T21:32:15Z [HELIX-753] Record top state handoff finished in single cluster data cache refresh ---
[GitHub] helix pull request #266: Propose design for aggregated cluster view service
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/266 Propose design for aggregated cluster view service This PR adds a design doc for aggregated cluster view service. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/view-aggregator-design Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/266.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #266 commit d7e6c1e0d51229319094e025ad6b70f5d5deed3e Author: Harry Zhang Date: 2018-08-21T02:11:14Z Propose design for aggregated cluster view service ---
[GitHub] helix pull request #258: [HELIX-741] make swap instance more robust and idem...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/258 [HELIX-741] make swap instance more robust and idempotent Made swap instance more robust: 1. List ideal state names and read ideal state individually to avoid partial read 2. remove redundant logics that test old instance status 3. make it idempotent 4. added test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/helix-admin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/258.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #258 commit 24c52394dfff91c045367260c969f76560ebeb62 Author: Harry Zhang Date: 2018-07-18T01:21:48Z [HELIX-741] make swap instance more robust and idempotent ---
[GitHub] helix pull request #257: [HELIX-740] check NPE in getInstancesInClusterWithT...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/257 [HELIX-740] check NPE in getInstancesInClusterWithTag and throw more meaningful exception Added cluster config check in `getInstancesInClusterWithTag()` and throw IllegalStateException when instance config is missing You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/helix-admin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/257.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #257 commit f4bb7d60782150c7d713c907211cc9d41f002c48 Author: Harry Zhang Date: 2018-07-17T22:50:02Z [HELIX-740] check NPE in getInstancesInClusterWithTag and throw more meaningful exception ---
[GitHub] helix issue #248: helix manager should support getting metadata store connec...
Github user zhan849 commented on the issue: https://github.com/apache/helix/pull/248 @kishoreg We have a request here that it would be handy to retrieve zk address from ZkHelixManager for user components to perform some customized operations in ZooKeeper without sharing same ZkClient with same helix component. As zk address is part of ZkHelixManager's configurations so adding a getter here fits the semantics. To make it more general (also to introduce less code change), such method should be part of HelixManager interface, and "MetadataStore" is a more generic name to use here. ---
[GitHub] helix pull request #248: helix manager should support getting metadata store...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/248 helix manager should support getting metadata store connection string Add an API to get metadatastore connection string in Helix Manager You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/helix-manager Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/248.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #248 commit 921d1fc9822a2ae3ddd2adc854a12ec486ad6c08 Author: Harry Zhang Date: 2018-07-16T22:23:30Z helix manager should support getting metadata store connection string ---
[GitHub] helix pull request #224: [HELIX-718] implement ThreadCountBasedTaskAssigner
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/224 [HELIX-718] implement ThreadCountBasedTaskAssigner In this RB, I implemented a thread count based task assigner that is optimized for short-term use cases. It assumes: - All tasks to assign have same quota type - All tasks to assign requires only 1 thread The algorithms did best effort that tasks with same type / same job are spread out: i.e. - if there are 3 nodes, each has 10 threads for each quota type A, B, and C - node1 is empty, node2 and node3 each has 5 typeB tasks and 5 typeC tasks running => when 3 typeA tasks are to be assigned, it will assign 1 typeA task to each node rather than squeeze all 3 typeA tasks to node1. Added tests for the assigner. Below is the profiling results, each result takes average of 100 trails: Assign 50K tasks onto 1K nodes: testing batch size: 1 Average time: 118ms testing batch size: 5000 Average time: 114ms testing batch size: 2000 Average time: 117ms testing batch size: 1000 Average time: 119ms testing batch size: 500 Average time: 123ms testing batch size: 100 Average time: 182ms Assign 10K tasks onto 1K nodes: testing batch size: 1 Average time: 25ms testing batch size: 5000 Average time: 21ms testing batch size: 2000 Average time: 22ms testing batch size: 1000 Average time: 25ms testing batch size: 500 Average time: 22ms testing batch size: 100 Average time: 34ms You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/simple-assigner Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/224.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #224 commit 6cb574d5aea6ca9cb9e6b5184bc80cb5e05d53b8 Author: Harry Zhang Date: 2018-07-09T23:04:19Z [HELIX-718] implement ThreadCountBasedTaskAssigner ---
[GitHub] helix pull request #223: [HELIX-718] provide a method in AssignableInstance ...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/223 [HELIX-718] provide a method in AssignableInstance to set current assignment This is required when an assignable instance is initialized, it needs to recover its current states You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/assignable-instance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/223.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #223 commit e44b29e03ef4c807e940cde717ed2f6fff58a273 Author: Harry Zhang Date: 2018-07-09T22:59:27Z [HELIX-718] provide a method in AssignableInstance to set current assignments ---
[GitHub] helix pull request #220: [HELIX-718] implement TaskAssignResult
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/220 [HELIX-718] implement TaskAssignResult Implement TaskAssignResult as a part of task assigner You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/task-assign-result Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/220.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #220 commit 701947d5a033792f21dd2796a29577702782fd26 Author: Harry Zhang Date: 2018-07-09T21:22:20Z [HELIX-718] implement TaskAssignResult ---
[GitHub] helix pull request #219: [HELIX-717] Add api for get / set quota type, ratio...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/219 [HELIX-717] Add api for get / set quota type, ratio and participant capacity Add api for get / set quota type, ratio and participant capacity You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/task-quota Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/219.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #219 commit 9ff603e9c39b53d5035cccb31fcf6edf82d97f18 Author: Harry Zhang Date: 2018-07-09T21:07:56Z [HELIX-717] Add api for get / set quota type, ratio and participant capacity ---
[GitHub] helix pull request #218: minor logging improvements
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/218 minor logging improvements minor log fixes You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/minor-improvements Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/218.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #218 commit 3870ab0f31a8f8a44ac5816ed0bde38fc7433bd0 Author: Harry Zhang Date: 2018-07-09T20:56:57Z minor logging improvements ---
[GitHub] helix pull request #214: [HELIX-709] Move external view calculation to async...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/214 [HELIX-709] Move external view calculation to async stage and re-organize pipeline - Separated controller pipeline to execute external view compute async and as early as possible - renamed AbstractAsyncBaseStage - fixed NPE in callback handler - all tests passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/async-ev Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/214.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #214 commit 542fbc840a167986a40bd57f3c5660d294acb63c Author: Harry Zhang Date: 2018-07-09T19:16:56Z [HELIX-709] Move external view calculation to async stage and re-organize pipeline ---
[GitHub] helix pull request #209: [HELIX-710] Create abstract state model for distrib...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/209 [HELIX-710] Create abstract state model for distributed leader standby helix service This RB abstracts a leader standby state model that helix services such as controller or other services would commonly use. This reduces duplicated code and simplifies state model implementation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/abstract-ls-state-model Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/209.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #209 commit 4a99bc43c6f22e478a49fb7f2bbac42d608f17b5 Author: Harry Zhang Date: 2018-06-28T21:32:51Z [HELIX-710] Create abstract state model for distributed leader standby helix service ---
[GitHub] helix pull request #208: [HELIX-709] Prepare controller stages for async exe...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/208 [HELIX-709] Prepare controller stages for async execution - Implemented AbstractAsyncBaseStage - Refactored TEVCalcState and PersistAssignmentStage to use AbstractAsyncBaseStage You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/aabs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/208.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #208 commit 9080c64429d724aa959207411ca06d690f5ee840 Author: Harry Zhang Date: 2018-06-28T21:25:21Z [HELIX-709] Prepare controller stages for async execution ---
[GitHub] helix pull request #206: [HELIX-706] process tev and persist assignment asyn...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/206 [HELIX-706] process tev and persist assignment asynchronously Added async worker in generic helix controller to process persist assignment stage and tev generation state asynchronously You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/async-ev Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/206.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #206 commit bb4ffd7e5663377427a5ad5988948659dd0db378 Author: Harry Zhang Date: 2018-06-26T23:05:50Z [HELIX-706] process tev and persist assignment asynchronously ---
[GitHub] helix pull request #204: [HELIX-705]: Participant duplicated state transitio...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/204 [HELIX-705]: Participant duplicated state transition handling rework Re-implemented helix task executor state transition message dedup logic, and added tests for verifying it: - Duplicated message in same batch: discard the later one - Duplicated message in different batches, the later one should be discarded if the first one is in progress - During state transition, we should not rely on current state delta to get partition's current state, but should lock on state model def (thread safety) - Duplicated state transition (toState == currentState) should not result in error, which is confusion, but should report success You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/participant-st-dedup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/204.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #204 commit 04f1ba9701ccfb4c55d44ab4bc159577c3afd68b Author: Harry Zhang Date: 2018-06-25T22:55:14Z [HELIX-705]: Participant duplicated state transition handling rework ---
[GitHub] helix pull request #200: throw new Exception to avoid ugly NPE
Github user zhan849 commented on a diff in the pull request: https://github.com/apache/helix/pull/200#discussion_r185911985 --- Diff: helix-core/src/main/java/org/apache/helix/tools/commandtools/ZkGrep.java --- @@ -463,9 +463,8 @@ static File gunzip(File zipFile) { return outputFile; } catch (IOException e) { LOG.error("fail to gunzip file: " + zipFile, e); + throw new Exception("fail to gunzip file" + zipFile); --- End diff -- @lujiefsi might not be a good idea to only check lastZkSnapshot, because gunzip() is used in multiple places and you need to check all of them. ---
[GitHub] helix issue #201: add null check in DeplayedAutoRebalancerz#computeNewIdealS...
Github user zhan849 commented on the issue: https://github.com/apache/helix/pull/201 @lujiefsi currently we have the assumption that resource will have state model def, which is registered by ParticipantManager. If you really want to fix the issue, then I'd suggest doing the following: - in computeNewIdealState, log error when we cannot find state model def, and mark all partitions as error, ResourceMonitor need to be updated accordingly. Don't throw exception as this will block rebalancing for all other valid resources - WorkflowConfig's start time is fetched from it's ScheduleConfig, which is enforced by builder (if no start time is provided, builder will fail the build). So we can assume it is always there. Similarly, if you really want to add check (assuming someone did not use our API to create object), don't throw exception, log error and record failure in workflow monitor ---
[GitHub] helix pull request #200: throw new Exception to avoid ugly NPE
Github user zhan849 commented on a diff in the pull request: https://github.com/apache/helix/pull/200#discussion_r185050120 --- Diff: helix-core/src/main/java/org/apache/helix/tools/commandtools/ZkGrep.java --- @@ -463,9 +463,8 @@ static File gunzip(File zipFile) { return outputFile; } catch (IOException e) { LOG.error("fail to gunzip file: " + zipFile, e); + throw new Exception("fail to gunzip file" + zipFile); --- End diff -- yes. Tooling is fine here as NPE is caught outside, and proper error message are printed out. BTW, wrapping IOException using generic Exception will erase the proper semantics that IOException carries, which is not a good practice. ---
[GitHub] helix issue #201: add null check in DeplayedAutoRebalancerz#computeNewIdealS...
Github user zhan849 commented on the issue: https://github.com/apache/helix/pull/201 1. IDEs are already doing NPE checking for us. 2. You are just detecting null and throw another exception, how's it different than NPE? ---
[GitHub] helix pull request #195: [HELIX-682] delete duplicated message and log error...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/195 [HELIX-682] delete duplicated message and log error in HelixTaskExecutor on participant This PR is the second part of message dedup on participant side You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/participant-msg-dedup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/195.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #195 commit 8aba9bea0734da11722fbc8cceb74f34dd6a37c6 Author: Harry Zhang <zhan849@...> Date: 2018-04-24T22:34:08Z [HELIX-682] delete duplicated message and log error in HelixTaskExecutor on participant ---
[GitHub] helix pull request #194: Fix broken TestWorkflowTermination
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/194 Fix broken TestWorkflowTermination test is broken by a temp fix before not to set JobState to NOT_STARTED when initializing workflow context, this PR fixes the test according to temp fix You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/test-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/194.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #194 commit 1d4df9c0739b10be4802dbe36f9870f097a21e6f Author: Harry Zhang <zhan849@...> Date: 2018-04-24T19:47:28Z Fix broken TestWorkflowTermination ---
[GitHub] helix pull request #184: fix broken TestTaskCreateThrottling
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/184 fix broken TestTaskCreateThrottling this PR fixes a broken test You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/test-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/184.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #184 commit 0d7bbfc2d181231d354a37fa0c8bdcfa22f6a07d Author: Harry Zhang <zhan849@...> Date: 2018-04-19T18:42:01Z fix broken TestTaskCreateThrottling ---
[GitHub] helix issue #180: Two minor fixes
Github user zhan849 commented on the issue: https://github.com/apache/helix/pull/180 @lei-xia done ---
[GitHub] helix pull request #182: [HELIX-695] add helix manager listener for new conn...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/182 [HELIX-695] add helix manager listener for new connection notification In this PR I added invocation and related tests of `stateListener.onConnected()` method in ZkHelixManager when it is connected. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/helix-manager-onconnected Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/182.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #182 commit 65e84713503437c542e545abd521c2ba6d26 Author: Harry Zhang <zhan849@...> Date: 2018-04-16T17:05:30Z [HELIX-695] add helix manager listener for new connection notification ---
[GitHub] helix pull request #174: [HELIX-691] Allow users to update InstanceConfig
Github user zhan849 commented on a diff in the pull request: https://github.com/apache/helix/pull/174#discussion_r180175528 --- Diff: helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/InstanceAccessor.java --- @@ -223,60 +224,60 @@ public Response updateInstance(@PathParam("clusterId") String clusterId, } switch (cmd) { - case enable: -admin.enableInstance(clusterId, instanceName, true); -break; - case disable: -admin.enableInstance(clusterId, instanceName, false); -break; - case reset: -if (!validInstance(node, instanceName)) { - return badRequest("Instance names are not match!"); -} -admin.resetPartition(clusterId, instanceName, -node.get(InstanceProperties.resource.name()).toString(), (List) OBJECT_MAPPER - .readValue(node.get(InstanceProperties.partitions.name()).toString(), -OBJECT_MAPPER.getTypeFactory() -.constructCollectionType(List.class, String.class))); -break; - case addInstanceTag: -if (!validInstance(node, instanceName)) { - return badRequest("Instance names are not match!"); -} -for (String tag : (List) OBJECT_MAPPER - .readValue(node.get(InstanceProperties.instanceTags.name()).toString(), - OBJECT_MAPPER.getTypeFactory().constructCollectionType(List.class, String.class))) { - admin.addInstanceTag(clusterId, instanceName, tag); -} -break; - case removeInstanceTag: -if (!validInstance(node, instanceName)) { - return badRequest("Instance names are not match!"); -} -for (String tag : (List) OBJECT_MAPPER - .readValue(node.get(InstanceProperties.instanceTags.name()).toString(), - OBJECT_MAPPER.getTypeFactory().constructCollectionType(List.class, String.class))) { - admin.removeInstanceTag(clusterId, instanceName, tag); -} -break; - case enablePartitions: -admin.enablePartition(true, clusterId, instanceName, -node.get(InstanceProperties.resource.name()).getTextValue(), -(List) OBJECT_MAPPER - .readValue(node.get(InstanceProperties.partitions.name()).toString(), -OBJECT_MAPPER.getTypeFactory() -.constructCollectionType(List.class, String.class))); -break; - case disablePartitions: -admin.enablePartition(false, clusterId, instanceName, -node.get(InstanceProperties.resource.name()).getTextValue(), -(List) OBJECT_MAPPER - .readValue(node.get(InstanceProperties.partitions.name()).toString(), - OBJECT_MAPPER.getTypeFactory().constructCollectionType(List.class, String.class))); -break; - default: -_logger.error("Unsupported command :" + command); -return badRequest("Unsupported command :" + command); +case enable: --- End diff -- Helix's formatter does not indent case, could you pls revert it back? Same for other places ---
[GitHub] helix pull request #174: [HELIX-691] Allow users to update InstanceConfig
Github user zhan849 commented on a diff in the pull request: https://github.com/apache/helix/pull/174#discussion_r180174778 --- Diff: helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/ResourceAccessor.java --- @@ -84,10 +96,66 @@ public Response getResources(@PathParam("clusterId") String clusterId) { return JSONRepresentation(root); } + /** --- End diff -- Partition health related changes are not part of this PR (allow user to change instance config), can we file different issues? ---
[GitHub] helix pull request #174: [HELIX-691] Allow users to update InstanceConfig
Github user zhan849 commented on a diff in the pull request: https://github.com/apache/helix/pull/174#discussion_r180178181 --- Diff: helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/InstanceAccessor.java --- @@ -315,26 +316,27 @@ public Response getInstanceConfig(@PathParam("clusterId") String clusterId, return notFound(); } - @PUT + @POST --- End diff -- PUT is for "set" and POST is for "patch", I'd suggest we keep both. @dasahcc thoughts? ---
[GitHub] helix pull request #175: [HELIX-692] use map instead of list to avoid deleti...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/175 [HELIX-692] use map instead of list to avoid deleting redundant message during cleanup Currently in MessageGenerationPhase, we are using list to store messages to GC. However, pending message is stored per resource/partition/instance, and under batch message mode, same message is stored once for each partition in the batch, which lead to the fact that we are cleaning up same message a lot of times. This RB changes list to map to avoid redundant cleanup You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/HELIX-692 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/175.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #175 commit ce7d2e9d275e1403375edd94d63afa94ea1a2234 Author: Harry Zhang <zhan849@...> Date: 2018-04-06T23:27:02Z [HELIX-692] use map instead of list to avoid deleting redundant message during cleanup ---
[GitHub] helix pull request #173: [HELIX-689] remove redundant logs from zkclient
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/173 [HELIX-689] remove redundant logs from zkclient Currently, in controller message cleanup, we print out 2 lines of message when message does not exist, which is totally redundant. In this PR, I removed the warning message from controller, and added error message in zkclient only when there is real error (exception from below). If we fail to delete a ZNode because znode does not exist, we do not print out message any more except debug mode You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/ctl-msg-cleanup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/173.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #173 commit d96a40caf19efffed3939b6dd8d9efe40734ec15 Author: Harry Zhang <zhan849@...> Date: 2018-04-03T21:22:53Z [HELIX-689] remove redundant logs from zkclient ---
[GitHub] helix pull request #162: [HELIX-683] clean monitoring cache upon helix contr...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/162 [HELIX-683] clean monitoring cache upon helix controller enable monitoring In this PR I added methods to clear monitoring records in cache when we enable cluster status monitoring. I also added tests to reproduce situation that a resource missed top state, controller lost leadership, resource regain top state, controller regain leadership, which will cause a metrics reporting problem You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/controller-monitor-cache-cleanup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/162.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #162 commit 373da77547fa1ea4a39c760e80da75e9d453d4f5 Author: Harry Zhang <zhan849@...> Date: 2018-03-26T19:14:07Z [HELIX-683] clean monitoring cache upon helix controller enable monitoring ---
[GitHub] helix pull request #156: [HELIX-682] controller should delete obsolete messa...
Github user zhan849 commented on a diff in the pull request: https://github.com/apache/helix/pull/156#discussion_r176498024 --- Diff: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java --- @@ -121,6 +131,18 @@ public void process(ClusterEvent event) throws Exception { Message message = null; + if (shouldCleanUpPendingMessage(pendingMessage, currentState, + currentStateOutput.getEndTime(resourceName, partition, instanceName))) { +logger.info( +"Adding pending message {} on instance {} to GC. Msg: {}->{}, current state of resource {}:{} is {}", --- End diff -- changed it to "cleanup" ---
[GitHub] helix pull request #152: [HELIX-681] don't fail state transition task if we ...
Github user zhan849 commented on a diff in the pull request: https://github.com/apache/helix/pull/152#discussion_r176204863 --- Diff: helix-core/src/main/java/org/apache/helix/util/HelixUtil.java --- @@ -219,4 +220,22 @@ public static String serializeByComma(List objects) { return idealStateMap; } + + /** + * Remove the given message from ZK using the given accessor. This function will + * not throw exception + * @param accessor HelixDataAccessor + * @param msg message to remove + * @param instanceName name of the instance on which the message sits + * @return true if success else false + */ + public static boolean removeMessageFromZK(HelixDataAccessor accessor, Message msg, + String instanceName) { +try { + return accessor.removeProperty(msg.getKey(accessor.keyBuilder(), instanceName)); +} catch (Exception e) { --- End diff -- it will not. the reason I did a general try-catch here is because I want to keep removeProperty semantics (only return true/false) here, but we do have leaked exception in underlying implementations of removeProperty() ---
[GitHub] helix pull request #152: [HELIX-681] don't fail state transition task if we ...
Github user zhan849 commented on a diff in the pull request: https://github.com/apache/helix/pull/152#discussion_r176182830 --- Diff: helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTask.java --- @@ -168,7 +169,14 @@ public HelixTaskResult call() { // forward relay messages attached to this message to other participants if (taskResult.isSuccess()) { -forwardRelayMessages(accessor, _message, taskResult.getCompleteTime()); +try { + forwardRelayMessages(accessor, _message, taskResult.getCompleteTime()); +} catch (Exception e) { + // Fail to send relay message should not result in a task execution failure + // Currently we don't log error to ZK to reduce writes as when accessor throws + // exception, ZK might not be in good condition. + logger.error("Failed to send relay messages.", e); --- End diff -- will change ---
[GitHub] helix pull request #156: [HELIX-682] controller should delete obsolete messa...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/156 [HELIX-682] controller should delete obsolete messages with timeout to unblock state transition This RB contains implementations and tests for controller: during MessageGenerationPhase, it checks if the pending message should be cleaned up on participant to unblock further state transition: - If partition's current state is same as message's toState, and the 3sec timeout already passed, in this case, it's likely that participant failed to delete message and controller should proactively remove the message so further rebalance could be unblocked - If partition's current state is same as message's fromState, this means the partition is undergoing state transition or the state transition has not started yet, in this case, we do nothing - If partition's current state is neither message's fromState nor toState (almost impossible), this means this message is a problematic one, and it is safe to delete it immediately so participant would not undergo an unnecessary message handling Message deletion on controller side is async You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/controller-msg-dedup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/156.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #156 commit 9f789dee0b17886bd97ebf4cc14e9d867043183d Author: Harry Zhang <zhan849@...> Date: 2018-03-21T01:47:02Z [HELIX-682] controller should delete obsolete messages with timeout to unblock state transition ---
[GitHub] helix pull request #152: [HELIX-681] don't fail state transition task if we ...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/152 [HELIX-681] don't fail state transition task if we fail to remove message or send out relay message This PR includes fix on participant side: 1. Consolidated message deletion logic to HelixUtil, as we currently have duplicated logics in various places 2. When we fail to delete message, we don't throw exception to fail task 3. When we fail to send out relay message, we don't throw exception to fail task You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/HELIX-681 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/152.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #152 ---
[GitHub] helix pull request #148: Move RoutingDataCache to BasicDataCache as a sharab...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/148 Move RoutingDataCache to BasicDataCache as a sharable component In this commit, I moved main logics of RoutingDataCache to BasicClusterDatqaCache under helix.common, to make it a commonly share-able component. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/cache-refactor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/148.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #148 commit baf8b830e787ba1b31d4f0ad2c07f3eb33a9208f Author: Harry Zhang <zhan849@...> Date: 2018-03-14T18:58:45Z Move RoutingDataCache to BasicDataCache as a sharable component ---
[GitHub] helix issue #146: [HELIX-680] add system setting to unblock TestZkCallbackHa...
Github user zhan849 commented on the issue: https://github.com/apache/helix/pull/146 @lei-xia just rebased ---
[GitHub] helix pull request #146: [HELIX-680] add system setting to unblock TestZkCal...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/146 [HELIX-680] add system setting to unblock TestZkCallbackHandlerLeak test with zookeeper 3.4.11 upgrade By adding system property in ZkUnitTestBase `beforeSuite()`, `TestZkCallbackHandlerLeak` can pass now You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/zk-test-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/146.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #146 commit e4de3c2247042754ed193789b8a8671012576e7d Author: hrzhang <hrzhang@...> Date: 2018-03-09T20:20:16Z [HELIX-680] add system setting to unblock TestZkCallbackHandlerLeak test with zookeeper 3.4.11 upgrade ---
[GitHub] helix pull request #140: [HELIX-679] consolidate semantics of recursively de...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/140 [HELIX-679] consolidate semantics of recursively delete path in ZkClient This change consolidates semantics of APIs in ZkClient that recursively deletes a path * For backward compatibility, we keep `deleteRecursive()`, which will only return true/false, and will not throw exception. * create a new method called deleteRecursively() that will only throw exception upon error. * mark `deleteRecursive()` as deprecated as throwing exception can carry error information * make all current usage of `deleteRecursive()` to `deleteRecursively()` You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/zk-client-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/140.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #140 commit 8412d3f7e7d8097eee820c4d055b1526ac74aca1 Author: hrzhang <hrzhang@...> Date: 2018-03-08T22:04:42Z [HELIX-679] consolidate semantics of recursively delete path in ZkClient ---