[GitHub] helix pull request #183: Fix race-condition issue that could block ZkClient ...
GitHub user lei-xia opened a pull request: https://github.com/apache/helix/pull/183 Fix race-condition issue that could block ZkClient event thread in CallbackHandler. 1) Replace native batch callback handling thread with DedupEventCallbackProcessor, so the queue is deduplicated when event gets enqueued, instead of at the time of dequeue. 2) Shutdown and clean up the processor properly when CallbackHanlder is reset(). 3) Add a flag to indicated whether the CallbackHandler is reset. If reset, do not enqueue further event into the batch callback queue. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lei-xia/helix master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/183.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #183 commit a25305eb83c3226d028e39c27a70293c2576756e Author: Lei XiaDate: 2018-04-16T18:38:26Z Fix race-condition issue that could block ZkClient event thread in CallbackHandler. ---
[GitHub] helix pull request #178: A few minor fixes.
Github user asfgit closed the pull request at: https://github.com/apache/helix/pull/178 ---
[GitHub] helix issue #179: Unique thread id for the threads that execute Tasks
Github user lei-xia commented on the issue: https://github.com/apache/helix/pull/179 Could you please rebase to the current HEAD of master branch? Thanks ---
[jira] [Commented] (HELIX-695) Add Helix Manager listener for new connection notification
[ https://issues.apache.org/jira/browse/HELIX-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439730#comment-16439730 ] ASF GitHub Bot commented on HELIX-695: -- GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/182 [HELIX-695] add helix manager listener for new connection notification In this PR I added invocation and related tests of `stateListener.onConnected()` method in ZkHelixManager when it is connected. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/helix-manager-onconnected Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/182.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #182 commit 65e84713503437c542e545abd521c2ba6d26 Author: Harry ZhangDate: 2018-04-16T17:05:30Z [HELIX-695] add helix manager listener for new connection notification > Add Helix Manager listener for new connection notification > -- > > Key: HELIX-695 > URL: https://issues.apache.org/jira/browse/HELIX-695 > Project: Apache Helix > Issue Type: Task >Reporter: Hao Zhang >Priority: Major > > Currently HelixManager is not notifying state listener about connection > establishment. Adding this notification is useful since HelixManager supports > get ZkClient method and when connection is re-established, ZkClient is newly > created and users who used get method to extract client should be notified > and refresh their client. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] helix pull request #182: [HELIX-695] add helix manager listener for new conn...
GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/182 [HELIX-695] add helix manager listener for new connection notification In this PR I added invocation and related tests of `stateListener.onConnected()` method in ZkHelixManager when it is connected. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/helix-manager-onconnected Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/182.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #182 commit 65e84713503437c542e545abd521c2ba6d26 Author: Harry ZhangDate: 2018-04-16T17:05:30Z [HELIX-695] add helix manager listener for new connection notification ---
[jira] [Created] (HELIX-695) Add Helix Manager listener for new connection notification
Hao Zhang created HELIX-695: --- Summary: Add Helix Manager listener for new connection notification Key: HELIX-695 URL: https://issues.apache.org/jira/browse/HELIX-695 Project: Apache Helix Issue Type: Task Reporter: Hao Zhang Currently HelixManager is not notifying state listener about connection establishment. Adding this notification is useful since HelixManager supports get ZkClient method and when connection is re-established, ZkClient is newly created and users who used get method to extract client should be notified and refresh their client. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-690) Batch message should not share same NotificationContext object to update CurrentState
[ https://issues.apache.org/jira/browse/HELIX-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439716#comment-16439716 ] ASF GitHub Bot commented on HELIX-690: -- GitHub user zhan849 opened a pull request: https://github.com/apache/helix/pull/181 [HELIX-690] batch message execution should not share same context In this PR, I added deep copy methods to NotificationContext so when processing messages in batch, different thread would not share the same notification context. This solves the problem that when processing BatchMessages, each thread will have their own current state delta to work on, so current states won't be messed up. Also modified some logs to make it more useful when debugging You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhan849/helix harry/batch-msg-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/181.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #181 commit bb7751b0f52aadcf04b7813fa3e99c8e266a3d0b Author: Harry ZhangDate: 2018-04-16T16:55:43Z [HELIX-690] batch message execution should not share same context > Batch message should not share same NotificationContext object to update > CurrentState > - > > Key: HELIX-690 > URL: https://issues.apache.org/jira/browse/HELIX-690 > Project: Apache Helix > Issue Type: Bug >Reporter: Hao Zhang >Priority: Major > > Currently batch message has bugs: > 1. Batch message is triggering a lot of duplicated state transition messages > sent from controller, result in "state does not match" error on participant > side. This will further create a lot of ERROR znodes in ZK, which adds up > both read/write workload in participant and controller > 2. We see a lot of concurrent update exceptions as well > {noformat} > 9909348:[2018-03-30 18:59:55,025] [ERROR] [pool-1-thread-1917] > [org.apache.helix.messaging.handling.HelixTask:113] - Exception while > executing a message. java.util.ConcurrentModificat > ionException msgId: fbdc37d4-ec95-47cb-950c-f9d3d224bbb3 type: > STATE_TRANSITION > 9909349-java.util.ConcurrentModificationException > 9909350- at > java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) > 9909351- at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) > 9909352- at org.apache.helix.ZNRecord.merge(ZNRecord.java:497) > 9909353- at org.apache.helix.GroupCommit.commit(GroupCommit.java:121) > 9909354- at > org.apache.helix.manager.zk.ZKHelixDataAccessor.updateProperty(ZKHelixDataAccessor.java:182) > 9909355- at > org.apache.helix.manager.zk.ZKHelixDataAccessor.updateProperty(ZKHelixDataAccessor.java:170) > 9909356- at > org.apache.helix.messaging.handling.BatchMessageHandler.postHandleMessage(BatchMessageHandler.java:118) > 9909357- at > org.apache.helix.messaging.handling.BatchMessageHandler.handleMessage(BatchMessageHandler.java:203) > 9909358- at > org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:96) > {noformat} > The above 2 errors are resulted in the fact that in HelixTaskExecutor, all > HelixTask objects from same batch of messages are sharing the same > changeContext object. For batch message, HelixTask will create current state > update map to record current state updates, and therefore result in a racing > condition in current state recording - it is very normal that due to such > bug, resource's current state is changed on participant side, current state > is not updated in ZK, and after message is removed, controller still think > that state transition is not finished, and send duplicated state transition > message. > > The error situation will only be triggered when the load is high, so not > covered by our unit / e2e tests > To fix the issue, we should create deep copies of NotificationContext object > for each HelixTask in HelixTaskExecutor. I tried this fix using large data > sets, and it worked. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)