[jira] [Commented] (HELIX-682) Stale message should not prevent controller from rebalancing resource

2018-04-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451358#comment-16451358
 ] 

Hudson commented on HELIX-682:
--

FAILURE: Integrated in Jenkins build helix #1451 (See 
[https://builds.apache.org/job/helix/1451/])
[HELIX-682] delete duplicated message and log error in HelixTaskExecutor 
(zhan849: rev 5f9fadc72bc1916f008792707db848ee51bbd997)
* (edit) 
helix-core/src/test/java/org/apache/helix/messaging/handling/TestHelixTaskExecutor.java
* (edit) helix-core/src/test/java/org/apache/helix/MockAccessor.java
* (edit) 
helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java


> Stale message should not prevent controller from rebalancing resource
> -
>
> Key: HELIX-682
> URL: https://issues.apache.org/jira/browse/HELIX-682
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Hao Zhang
>Priority: Major
>
> Currently during MessageGenerationPhase, we skip re-balancing when there is 
> pending message. Though we assume that participant will delete messages when 
> they finish the task, there will be cases that when ZK is not stable and 
> participant fail to do so, which will leave message un-deleted and thus block 
> rebalance.
> Ideally on controller side, we should try to delete message as well: if 
> partition's current state is same as message's toState, or there is totally 
> invalid message remaining, controller should try to delete message to unblock 
> rebalancing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HELIX-682) Stale message should not prevent controller from rebalancing resource

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451357#comment-16451357
 ] 

ASF GitHub Bot commented on HELIX-682:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/195


> Stale message should not prevent controller from rebalancing resource
> -
>
> Key: HELIX-682
> URL: https://issues.apache.org/jira/browse/HELIX-682
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Hao Zhang
>Priority: Major
>
> Currently during MessageGenerationPhase, we skip re-balancing when there is 
> pending message. Though we assume that participant will delete messages when 
> they finish the task, there will be cases that when ZK is not stable and 
> participant fail to do so, which will leave message un-deleted and thus block 
> rebalance.
> Ideally on controller side, we should try to delete message as well: if 
> partition's current state is same as message's toState, or there is totally 
> invalid message remaining, controller should try to delete message to unblock 
> rebalancing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HELIX-682) Stale message should not prevent controller from rebalancing resource

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451348#comment-16451348
 ] 

ASF GitHub Bot commented on HELIX-682:
--

GitHub user zhan849 opened a pull request:

https://github.com/apache/helix/pull/195

[HELIX-682] delete duplicated message and log error in HelixTaskExecutor on 
participant

This PR is the second part of message dedup on participant side

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhan849/helix harry/participant-msg-dedup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/195.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #195


commit 8aba9bea0734da11722fbc8cceb74f34dd6a37c6
Author: Harry Zhang 
Date:   2018-04-24T22:34:08Z

[HELIX-682] delete duplicated message and log error in HelixTaskExecutor on 
participant




> Stale message should not prevent controller from rebalancing resource
> -
>
> Key: HELIX-682
> URL: https://issues.apache.org/jira/browse/HELIX-682
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Hao Zhang
>Priority: Major
>
> Currently during MessageGenerationPhase, we skip re-balancing when there is 
> pending message. Though we assume that participant will delete messages when 
> they finish the task, there will be cases that when ZK is not stable and 
> participant fail to do so, which will leave message un-deleted and thus block 
> rebalance.
> Ideally on controller side, we should try to delete message as well: if 
> partition's current state is same as message's toState, or there is totally 
> invalid message remaining, controller should try to delete message to unblock 
> rebalancing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HELIX-682) Stale message should not prevent controller from rebalancing resource

2018-03-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412879#comment-16412879
 ] 

Hudson commented on HELIX-682:
--

FAILURE: Integrated in Jenkins build helix #1421 (See 
[https://builds.apache.org/job/helix/1421/])
[HELIX-682] controller should delete obsolete messages with timeout to 
(zhan849: rev 4d652eb9aabaeaacbcfd2df6eba70b5f8c442094)
* (edit) 
helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
* (edit) 
helix-core/src/test/java/org/apache/helix/controller/stages/TestRebalancePipeline.java
* (edit) 
helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateComputationStage.java
* (edit) 
helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateOutput.java


> Stale message should not prevent controller from rebalancing resource
> -
>
> Key: HELIX-682
> URL: https://issues.apache.org/jira/browse/HELIX-682
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Hao Zhang
>Priority: Major
>
> Currently during MessageGenerationPhase, we skip re-balancing when there is 
> pending message. Though we assume that participant will delete messages when 
> they finish the task, there will be cases that when ZK is not stable and 
> participant fail to do so, which will leave message un-deleted and thus block 
> rebalance.
> Ideally on controller side, we should try to delete message as well: if 
> partition's current state is same as message's toState, or there is totally 
> invalid message remaining, controller should try to delete message to unblock 
> rebalancing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HELIX-682) Stale message should not prevent controller from rebalancing resource

2018-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412877#comment-16412877
 ] 

ASF GitHub Bot commented on HELIX-682:
--

Github user asfgit closed the pull request at:

https://github.com/apache/helix/pull/156


> Stale message should not prevent controller from rebalancing resource
> -
>
> Key: HELIX-682
> URL: https://issues.apache.org/jira/browse/HELIX-682
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Hao Zhang
>Priority: Major
>
> Currently during MessageGenerationPhase, we skip re-balancing when there is 
> pending message. Though we assume that participant will delete messages when 
> they finish the task, there will be cases that when ZK is not stable and 
> participant fail to do so, which will leave message un-deleted and thus block 
> rebalance.
> Ideally on controller side, we should try to delete message as well: if 
> partition's current state is same as message's toState, or there is totally 
> invalid message remaining, controller should try to delete message to unblock 
> rebalancing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HELIX-682) Stale message should not prevent controller from rebalancing resource

2018-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412860#comment-16412860
 ] 

ASF GitHub Bot commented on HELIX-682:
--

Github user lei-xia commented on the issue:

https://github.com/apache/helix/pull/156
  
Please rebase to HEAD


> Stale message should not prevent controller from rebalancing resource
> -
>
> Key: HELIX-682
> URL: https://issues.apache.org/jira/browse/HELIX-682
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Hao Zhang
>Priority: Major
>
> Currently during MessageGenerationPhase, we skip re-balancing when there is 
> pending message. Though we assume that participant will delete messages when 
> they finish the task, there will be cases that when ZK is not stable and 
> participant fail to do so, which will leave message un-deleted and thus block 
> rebalance.
> Ideally on controller side, we should try to delete message as well: if 
> partition's current state is same as message's toState, or there is totally 
> invalid message remaining, controller should try to delete message to unblock 
> rebalancing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HELIX-682) Stale message should not prevent controller from rebalancing resource

2018-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409910#comment-16409910
 ] 

ASF GitHub Bot commented on HELIX-682:
--

Github user zhan849 commented on a diff in the pull request:

https://github.com/apache/helix/pull/156#discussion_r176498024
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
 ---
@@ -121,6 +131,18 @@ public void process(ClusterEvent event) throws 
Exception {
 
   Message message = null;
 
+  if (shouldCleanUpPendingMessage(pendingMessage, currentState,
+  currentStateOutput.getEndTime(resourceName, partition, 
instanceName))) {
+logger.info(
+"Adding pending message {} on instance {} to GC. Msg: 
{}->{}, current state of resource {}:{} is {}",
--- End diff --

changed it to "cleanup"


> Stale message should not prevent controller from rebalancing resource
> -
>
> Key: HELIX-682
> URL: https://issues.apache.org/jira/browse/HELIX-682
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Hao Zhang
>Priority: Major
>
> Currently during MessageGenerationPhase, we skip re-balancing when there is 
> pending message. Though we assume that participant will delete messages when 
> they finish the task, there will be cases that when ZK is not stable and 
> participant fail to do so, which will leave message un-deleted and thus block 
> rebalance.
> Ideally on controller side, we should try to delete message as well: if 
> partition's current state is same as message's toState, or there is totally 
> invalid message remaining, controller should try to delete message to unblock 
> rebalancing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HELIX-682) Stale message should not prevent controller from rebalancing resource

2018-03-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408772#comment-16408772
 ] 

ASF GitHub Bot commented on HELIX-682:
--

Github user dasahcc commented on a diff in the pull request:

https://github.com/apache/helix/pull/156#discussion_r176271788
  
--- Diff: 
helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
 ---
@@ -121,6 +131,18 @@ public void process(ClusterEvent event) throws 
Exception {
 
   Message message = null;
 
+  if (shouldCleanUpPendingMessage(pendingMessage, currentState,
+  currentStateOutput.getEndTime(resourceName, partition, 
instanceName))) {
+logger.info(
+"Adding pending message {} on instance {} to GC. Msg: 
{}->{}, current state of resource {}:{} is {}",
--- End diff --

Let's not use GC name for it.


> Stale message should not prevent controller from rebalancing resource
> -
>
> Key: HELIX-682
> URL: https://issues.apache.org/jira/browse/HELIX-682
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Hao Zhang
>Priority: Major
>
> Currently during MessageGenerationPhase, we skip re-balancing when there is 
> pending message. Though we assume that participant will delete messages when 
> they finish the task, there will be cases that when ZK is not stable and 
> participant fail to do so, which will leave message un-deleted and thus block 
> rebalance.
> Ideally on controller side, we should try to delete message as well: if 
> partition's current state is same as message's toState, or there is totally 
> invalid message remaining, controller should try to delete message to unblock 
> rebalancing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HELIX-682) Stale message should not prevent controller from rebalancing resource

2018-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HELIX-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407337#comment-16407337
 ] 

ASF GitHub Bot commented on HELIX-682:
--

GitHub user zhan849 opened a pull request:

https://github.com/apache/helix/pull/156

[HELIX-682] controller should delete obsolete messages with timeout to 
unblock state transition

This RB contains implementations and tests for controller: during 
MessageGenerationPhase, it checks if the pending message should be cleaned up 
on participant to unblock further state transition:

- If partition's current state is same as message's toState, and the 3sec 
timeout already passed, in this case, it's likely that participant failed to 
delete message and controller should proactively remove the message so further 
rebalance could be unblocked
- If partition's current state is same as message's fromState, this means 
the partition is undergoing state transition or the state transition has not 
started yet, in this case, we do nothing
- If partition's current state is neither message's fromState nor toState 
(almost impossible), this means this message is a problematic one, and it is 
safe to delete it immediately so participant would not undergo an unnecessary 
message handling

Message deletion on controller side is async

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhan849/helix harry/controller-msg-dedup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/helix/pull/156.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #156


commit 9f789dee0b17886bd97ebf4cc14e9d867043183d
Author: Harry Zhang 
Date:   2018-03-21T01:47:02Z

[HELIX-682] controller should delete obsolete messages with timeout to 
unblock state transition




> Stale message should not prevent controller from rebalancing resource
> -
>
> Key: HELIX-682
> URL: https://issues.apache.org/jira/browse/HELIX-682
> Project: Apache Helix
>  Issue Type: Bug
>Reporter: Hao Zhang
>Priority: Major
>
> Currently during MessageGenerationPhase, we skip re-balancing when there is 
> pending message. Though we assume that participant will delete messages when 
> they finish the task, there will be cases that when ZK is not stable and 
> participant fail to do so, which will leave message un-deleted and thus block 
> rebalance.
> Ideally on controller side, we should try to delete message as well: if 
> partition's current state is same as message's toState, or there is totally 
> invalid message remaining, controller should try to delete message to unblock 
> rebalancing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)