[jira] [Commented] (NIFI-5096) When Primary Node changes, occasionally both the new and old primary nodes continue running processors

2018-04-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446348#comment-16446348
 ] 

ASF subversion and git services commented on NIFI-5096:
---

Commit 54eb6bc23211ad2b499f42e14759f3646f806d2f in nifi's branch 
refs/heads/master from [~markap14]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=54eb6bc ]

NIFI-5096: Periodically poll ZooKeeper to determine the leader for each 
registered role in Leader Election. This avoids a condition whereby a node may 
occasionally fail to receive notification that it is no longer the elected 
leader.
NIFI-5096: More proactively setting leadership to false when ZooKeeper/Curator 
ConnectionState changes

This closes #2646


> When Primary Node changes, occasionally both the new and old primary nodes 
> continue running processors
> --
>
> Key: NIFI-5096
> URL: https://issues.apache.org/jira/browse/NIFI-5096
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
> Fix For: 1.7.0
>
>
> Occasionally we will see that Node A is Primary Node and then the Primary 
> Node switches to Node B, resulting in both Node A and Node B running 
> processors that are marked as Primary Node only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5096) When Primary Node changes, occasionally both the new and old primary nodes continue running processors

2018-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446350#comment-16446350
 ] 

ASF GitHub Bot commented on NIFI-5096:
--

Github user mcgilman commented on the issue:

https://github.com/apache/nifi/pull/2646
  
Thanks @markap14! This has been merged to master.


> When Primary Node changes, occasionally both the new and old primary nodes 
> continue running processors
> --
>
> Key: NIFI-5096
> URL: https://issues.apache.org/jira/browse/NIFI-5096
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
> Fix For: 1.7.0
>
>
> Occasionally we will see that Node A is Primary Node and then the Primary 
> Node switches to Node B, resulting in both Node A and Node B running 
> processors that are marked as Primary Node only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5096) When Primary Node changes, occasionally both the new and old primary nodes continue running processors

2018-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446351#comment-16446351
 ] 

ASF GitHub Bot commented on NIFI-5096:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/2646


> When Primary Node changes, occasionally both the new and old primary nodes 
> continue running processors
> --
>
> Key: NIFI-5096
> URL: https://issues.apache.org/jira/browse/NIFI-5096
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
> Fix For: 1.7.0
>
>
> Occasionally we will see that Node A is Primary Node and then the Primary 
> Node switches to Node B, resulting in both Node A and Node B running 
> processors that are marked as Primary Node only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5096) When Primary Node changes, occasionally both the new and old primary nodes continue running processors

2018-04-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446349#comment-16446349
 ] 

ASF subversion and git services commented on NIFI-5096:
---

Commit 54eb6bc23211ad2b499f42e14759f3646f806d2f in nifi's branch 
refs/heads/master from [~markap14]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=54eb6bc ]

NIFI-5096: Periodically poll ZooKeeper to determine the leader for each 
registered role in Leader Election. This avoids a condition whereby a node may 
occasionally fail to receive notification that it is no longer the elected 
leader.
NIFI-5096: More proactively setting leadership to false when ZooKeeper/Curator 
ConnectionState changes

This closes #2646


> When Primary Node changes, occasionally both the new and old primary nodes 
> continue running processors
> --
>
> Key: NIFI-5096
> URL: https://issues.apache.org/jira/browse/NIFI-5096
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
> Fix For: 1.7.0
>
>
> Occasionally we will see that Node A is Primary Node and then the Primary 
> Node switches to Node B, resulting in both Node A and Node B running 
> processors that are marked as Primary Node only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5096) When Primary Node changes, occasionally both the new and old primary nodes continue running processors

2018-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446272#comment-16446272
 ] 

ASF GitHub Bot commented on NIFI-5096:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2646
  
@mcgilman I agree. I have pushed a new commit that does just that.


> When Primary Node changes, occasionally both the new and old primary nodes 
> continue running processors
> --
>
> Key: NIFI-5096
> URL: https://issues.apache.org/jira/browse/NIFI-5096
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
>
> Occasionally we will see that Node A is Primary Node and then the Primary 
> Node switches to Node B, resulting in both Node A and Node B running 
> processors that are marked as Primary Node only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5096) When Primary Node changes, occasionally both the new and old primary nodes continue running processors

2018-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446234#comment-16446234
 ] 

ASF GitHub Bot commented on NIFI-5096:
--

Github user mcgilman commented on the issue:

https://github.com/apache/nifi/pull/2646
  
@markap14 I see. It appears then that the underlying issue is that either 
(1) the stateChange method is not being invoked or (2) the leader thread 
interruption is not happening/working. We could include your proposed changes 
and update our implementation of stateChanged to set `leader` to false when the 
`newState` is LOST or SUSPENDED before invoking super. Additionally, in 
`takeLeadership` we should loop while not stopped and is leader. This should 
help if the underlying issue was (2) while the polling could act as additional 
insurance.


> When Primary Node changes, occasionally both the new and old primary nodes 
> continue running processors
> --
>
> Key: NIFI-5096
> URL: https://issues.apache.org/jira/browse/NIFI-5096
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
>
> Occasionally we will see that Node A is Primary Node and then the Primary 
> Node switches to Node B, resulting in both Node A and Node B running 
> processors that are marked as Primary Node only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5096) When Primary Node changes, occasionally both the new and old primary nodes continue running processors

2018-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446163#comment-16446163
 ] 

ASF GitHub Bot commented on NIFI-5096:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2646
  
@mcgilman we do indeed implement the ConnectionStateListener, but we do so 
only to log the fact and then call super.stateChanged(). When we call 
super.stateChanged(), that will throw CancelLeadershipException, which in turn 
is supposed to interrupt our listener. We followed the "Error Handling" 
guidance provided by Apache Curator: 
https://curator.apache.org/curator-recipes/leader-election.html

So we are handling the SUSPENDED and LOST scenarios as is recommended. And 
this works 99% of the time. Unfortunately, we do occasionally see scenarios 
where it does not interrupt the thread and as such the node believes that it 
retains the lock. It's not clear, when this happens, if the thread just wasn't 
interrupted for some reason, or if the notification of SUSPENDED/LOST never was 
received, or what exactly is occurring that prevents our ElectionListener from 
being interrupted.

That's why I went with the solution of periodically polling ZooKeeper, to 
check the state. That way, whatever the cause of the thread not being 
interrupted, we still will break out. If you think it makes sense, though, we 
can detect the LOST state specifically and have that trigger us to leave the 
election, in addition to polling?


> When Primary Node changes, occasionally both the new and old primary nodes 
> continue running processors
> --
>
> Key: NIFI-5096
> URL: https://issues.apache.org/jira/browse/NIFI-5096
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
>
> Occasionally we will see that Node A is Primary Node and then the Primary 
> Node switches to Node B, resulting in both Node A and Node B running 
> processors that are marked as Primary Node only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5096) When Primary Node changes, occasionally both the new and old primary nodes continue running processors

2018-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1654#comment-1654
 ] 

ASF GitHub Bot commented on NIFI-5096:
--

Github user mcgilman commented on the issue:

https://github.com/apache/nifi/pull/2646
  
@markap14 Looks like our `LeaderSelectionListener` does implement 
`ConnectionStateListener`. Do we need to update code on our side to explicitly 
give up leadership in the SUSPENDED or LOST scenarios?


> When Primary Node changes, occasionally both the new and old primary nodes 
> continue running processors
> --
>
> Key: NIFI-5096
> URL: https://issues.apache.org/jira/browse/NIFI-5096
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
>
> Occasionally we will see that Node A is Primary Node and then the Primary 
> Node switches to Node B, resulting in both Node A and Node B running 
> processors that are marked as Primary Node only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5096) When Primary Node changes, occasionally both the new and old primary nodes continue running processors

2018-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1632#comment-1632
 ] 

ASF GitHub Bot commented on NIFI-5096:
--

Github user mcgilman commented on the issue:

https://github.com/apache/nifi/pull/2646
  
@markap14 These changes look like they will reduce the time when there are 
multiple primaries (leaders) in the event the existing primary isn't notified 
of the change. That's definitely an improvement, however, I was curious if you 
tried incorporating a `ConnectionStateListener` for the node to be more 
proactively notified when the connection is SUSPENDED or LOST. Here's an SO 
post [1] where it is discussed.

[1] 
https://stackoverflow.com/questions/41042798/how-to-handle-apache-curator-distributed-lock-loss-of-connection


> When Primary Node changes, occasionally both the new and old primary nodes 
> continue running processors
> --
>
> Key: NIFI-5096
> URL: https://issues.apache.org/jira/browse/NIFI-5096
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
>
> Occasionally we will see that Node A is Primary Node and then the Primary 
> Node switches to Node B, resulting in both Node A and Node B running 
> processors that are marked as Primary Node only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5096) When Primary Node changes, occasionally both the new and old primary nodes continue running processors

2018-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444031#comment-16444031
 ] 

ASF GitHub Bot commented on NIFI-5096:
--

GitHub user markap14 opened a pull request:

https://github.com/apache/nifi/pull/2646

NIFI-5096: Periodically poll ZooKeeper to determine the leader for ea…

…ch registered role in Leader Election. This avoids a condition whereby a 
node may occasionally fail to receive notification that it is no longer the 
elected leader.

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/markap14/nifi NIFI-5096

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2646.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2646


commit 757c4e2445d052b593fbea8f0d9a36bac001f44f
Author: Mark Payne 
Date:   2018-04-19T13:05:32Z

NIFI-5096: Periodically poll ZooKeeper to determine the leader for each 
registered role in Leader Election. This avoids a condition whereby a node may 
occasionally fail to receive notification that it is no longer the elected 
leader.




> When Primary Node changes, occasionally both the new and old primary nodes 
> continue running processors
> --
>
> Key: NIFI-5096
> URL: https://issues.apache.org/jira/browse/NIFI-5096
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
>
> Occasionally we will see that Node A is Primary Node and then the Primary 
> Node switches to Node B, resulting in both Node A and Node B running 
> processors that are marked as Primary Node only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)