[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator

2016-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423802#comment-15423802
 ] 

ASF GitHub Bot commented on NIFI-2566:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/866


> Clustered Nodes can become out of sync regarding which node is coordinator
> --
>
> Key: NIFI-2566
> URL: https://issues.apache.org/jira/browse/NIFI-2566
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
> Fix For: 1.0.0
>
>
> Occasionally, I will see the UI telling me that no Cluster Coordinator has 
> been elected. However, I can see in the logs that the node is sending 
> heartbeats to the coordinator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator

2016-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423798#comment-15423798
 ] 

ASF subversion and git services commented on NIFI-2566:
---

Commit e42ea9ad457c5b5a1d2da5fa3d3494aacb0cb8d4 in nifi's branch 
refs/heads/master from [~markap14]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=e42ea9a ]

NIFI-2566: Refactored to allow just the Leader Election Manager to be 
responsible for determining who is the Cluster Coordinator

NIFI-2566: Removed storage of cluster roles from heartbeats and 
NodeConnectionStatus; use LeaderElectionManager to determine roles instead

NIFI-2566: Updated Heartbeats so that if a node is out-of-sync with cluster 
topology, cluster coordinator will provide updated information back to the nodes

NIFI-2566: Fixed issue that prevented standalone instance from starting by 
creating a standalone-instance version of the Leader Election Manager. Also 
added Controller Service enabled/disabled state to fingerprint rather than 
attempting to update the state when joining the cluster, as the implementation 
was incorrect and the correct implementation will be a rather significant 
effort that doesn't have to happen for 1.0.0 release

This closes #866

Signed-off-by: jpercivall 


> Clustered Nodes can become out of sync regarding which node is coordinator
> --
>
> Key: NIFI-2566
> URL: https://issues.apache.org/jira/browse/NIFI-2566
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
> Fix For: 1.0.0
>
>
> Occasionally, I will see the UI telling me that no Cluster Coordinator has 
> been elected. However, I can see in the logs that the node is sending 
> heartbeats to the coordinator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator

2016-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423800#comment-15423800
 ] 

ASF subversion and git services commented on NIFI-2566:
---

Commit e42ea9ad457c5b5a1d2da5fa3d3494aacb0cb8d4 in nifi's branch 
refs/heads/master from [~markap14]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=e42ea9a ]

NIFI-2566: Refactored to allow just the Leader Election Manager to be 
responsible for determining who is the Cluster Coordinator

NIFI-2566: Removed storage of cluster roles from heartbeats and 
NodeConnectionStatus; use LeaderElectionManager to determine roles instead

NIFI-2566: Updated Heartbeats so that if a node is out-of-sync with cluster 
topology, cluster coordinator will provide updated information back to the nodes

NIFI-2566: Fixed issue that prevented standalone instance from starting by 
creating a standalone-instance version of the Leader Election Manager. Also 
added Controller Service enabled/disabled state to fingerprint rather than 
attempting to update the state when joining the cluster, as the implementation 
was incorrect and the correct implementation will be a rather significant 
effort that doesn't have to happen for 1.0.0 release

This closes #866

Signed-off-by: jpercivall 


> Clustered Nodes can become out of sync regarding which node is coordinator
> --
>
> Key: NIFI-2566
> URL: https://issues.apache.org/jira/browse/NIFI-2566
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
> Fix For: 1.0.0
>
>
> Occasionally, I will see the UI telling me that no Cluster Coordinator has 
> been elected. However, I can see in the logs that the node is sending 
> heartbeats to the coordinator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator

2016-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423799#comment-15423799
 ] 

ASF subversion and git services commented on NIFI-2566:
---

Commit e42ea9ad457c5b5a1d2da5fa3d3494aacb0cb8d4 in nifi's branch 
refs/heads/master from [~markap14]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=e42ea9a ]

NIFI-2566: Refactored to allow just the Leader Election Manager to be 
responsible for determining who is the Cluster Coordinator

NIFI-2566: Removed storage of cluster roles from heartbeats and 
NodeConnectionStatus; use LeaderElectionManager to determine roles instead

NIFI-2566: Updated Heartbeats so that if a node is out-of-sync with cluster 
topology, cluster coordinator will provide updated information back to the nodes

NIFI-2566: Fixed issue that prevented standalone instance from starting by 
creating a standalone-instance version of the Leader Election Manager. Also 
added Controller Service enabled/disabled state to fingerprint rather than 
attempting to update the state when joining the cluster, as the implementation 
was incorrect and the correct implementation will be a rather significant 
effort that doesn't have to happen for 1.0.0 release

This closes #866

Signed-off-by: jpercivall 


> Clustered Nodes can become out of sync regarding which node is coordinator
> --
>
> Key: NIFI-2566
> URL: https://issues.apache.org/jira/browse/NIFI-2566
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
> Fix For: 1.0.0
>
>
> Occasionally, I will see the UI telling me that no Cluster Coordinator has 
> been elected. However, I can see in the logs that the node is sending 
> heartbeats to the coordinator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator

2016-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423801#comment-15423801
 ] 

ASF subversion and git services commented on NIFI-2566:
---

Commit e42ea9ad457c5b5a1d2da5fa3d3494aacb0cb8d4 in nifi's branch 
refs/heads/master from [~markap14]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=e42ea9a ]

NIFI-2566: Refactored to allow just the Leader Election Manager to be 
responsible for determining who is the Cluster Coordinator

NIFI-2566: Removed storage of cluster roles from heartbeats and 
NodeConnectionStatus; use LeaderElectionManager to determine roles instead

NIFI-2566: Updated Heartbeats so that if a node is out-of-sync with cluster 
topology, cluster coordinator will provide updated information back to the nodes

NIFI-2566: Fixed issue that prevented standalone instance from starting by 
creating a standalone-instance version of the Leader Election Manager. Also 
added Controller Service enabled/disabled state to fingerprint rather than 
attempting to update the state when joining the cluster, as the implementation 
was incorrect and the correct implementation will be a rather significant 
effort that doesn't have to happen for 1.0.0 release

This closes #866

Signed-off-by: jpercivall 


> Clustered Nodes can become out of sync regarding which node is coordinator
> --
>
> Key: NIFI-2566
> URL: https://issues.apache.org/jira/browse/NIFI-2566
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
> Fix For: 1.0.0
>
>
> Occasionally, I will see the UI telling me that no Cluster Coordinator has 
> been elected. However, I can see in the logs that the node is sending 
> heartbeats to the coordinator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator

2016-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422849#comment-15422849
 ] 

ASF GitHub Bot commented on NIFI-2566:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/866
  
@JPercivall I pushed an update that should address this issue.


> Clustered Nodes can become out of sync regarding which node is coordinator
> --
>
> Key: NIFI-2566
> URL: https://issues.apache.org/jira/browse/NIFI-2566
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
> Fix For: 1.0.0
>
>
> Occasionally, I will see the UI telling me that no Cluster Coordinator has 
> been elected. However, I can see in the logs that the node is sending 
> heartbeats to the coordinator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator

2016-08-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421911#comment-15421911
 ] 

ASF GitHub Bot commented on NIFI-2566:
--

Github user JPercivall commented on the issue:

https://github.com/apache/nifi/pull/866
  
@markap14, I'm having trouble running a standalone instance with this PR. I 
get the at I can't start the FlowController due to (very long message, took the 
end):

`Caused by: java.lang.IllegalStateException: The 
'nifi.zookeeper.connect.string' property is not set in nifi.properties
at 
org.apache.nifi.controller.cluster.ZooKeeperClientConfig.createConfig(ZooKeeperClientConfig.java:76)
 ~[nifi-framework-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
at 
org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.(CuratorLeaderElectionManager.java:61)
 ~[nifi-framework-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) ~[na:1.8.0_74]
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 ~[na:1.8.0_74]
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[na:1.8.0_74]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
~[na:1.8.0_74]
at 
org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:147) 
~[spring-beans-4.2.4.RELEASE.jar:4.2.4.RELEASE]
... 52 common frames omitted`



> Clustered Nodes can become out of sync regarding which node is coordinator
> --
>
> Key: NIFI-2566
> URL: https://issues.apache.org/jira/browse/NIFI-2566
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
> Fix For: 1.0.0
>
>
> Occasionally, I will see the UI telling me that no Cluster Coordinator has 
> been elected. However, I can see in the logs that the node is sending 
> heartbeats to the coordinator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator

2016-08-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421657#comment-15421657
 ] 

ASF GitHub Bot commented on NIFI-2566:
--

Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/866#discussion_r74833401
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/cluster/ClusterProtocolHeartbeater.java
 ---
@@ -18,102 +18,80 @@
 package org.apache.nifi.controller.cluster;
 
 import java.io.IOException;
-import java.nio.charset.StandardCharsets;
-import java.util.Properties;
-
-import org.apache.curator.RetryPolicy;
-import org.apache.curator.framework.CuratorFramework;
-import org.apache.curator.framework.CuratorFrameworkFactory;
-import org.apache.curator.retry.RetryNTimes;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import org.apache.nifi.cluster.coordination.ClusterCoordinator;
+import org.apache.nifi.cluster.coordination.node.ClusterRoles;
+import org.apache.nifi.cluster.coordination.node.NodeConnectionStatus;
+import org.apache.nifi.cluster.protocol.HeartbeatPayload;
+import org.apache.nifi.cluster.protocol.NodeIdentifier;
 import org.apache.nifi.cluster.protocol.NodeProtocolSender;
 import org.apache.nifi.cluster.protocol.ProtocolException;
 import org.apache.nifi.cluster.protocol.message.HeartbeatMessage;
-import org.apache.zookeeper.WatchedEvent;
-import org.apache.zookeeper.Watcher;
+import org.apache.nifi.cluster.protocol.message.HeartbeatResponseMessage;
+import org.apache.nifi.controller.leader.election.LeaderElectionManager;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 /**
- * Uses ZooKeeper in order to determine which node is the elected Cluster 
Coordinator and to indicate
- * that this node is part of the cluster. However, once the Cluster 
Coordinator is known, heartbeats are
+ * Uses Leader Election Manager in order to determine which node is the 
elected Cluster Coordinator and to indicate
+ * that this node is part of the cluster. Once the Cluster Coordinator is 
known, heartbeats are
  * sent directly to the Cluster Coordinator.
  */
 public class ClusterProtocolHeartbeater implements Heartbeater {
 private static final Logger logger = 
LoggerFactory.getLogger(ClusterProtocolHeartbeater.class);
 
 private final NodeProtocolSender protocolSender;
-private final CuratorFramework curatorClient;
-private final String nodesPathPrefix;
-
-private final String coordinatorPath;
-private volatile String coordinatorAddress;
+private final LeaderElectionManager electionManager;
+private final ClusterCoordinator clusterCoordinator;
 
-
-public ClusterProtocolHeartbeater(final NodeProtocolSender 
protocolSender, final Properties properties) {
+public ClusterProtocolHeartbeater(final NodeProtocolSender 
protocolSender, final ClusterCoordinator clusterCoordinator, final 
LeaderElectionManager electionManager) {
 this.protocolSender = protocolSender;
-
-final RetryPolicy retryPolicy = new RetryNTimes(10, 500);
-final ZooKeeperClientConfig zkConfig = 
ZooKeeperClientConfig.createConfig(properties);
-
-curatorClient = 
CuratorFrameworkFactory.newClient(zkConfig.getConnectString(),
-zkConfig.getSessionTimeoutMillis(), 
zkConfig.getConnectionTimeoutMillis(), retryPolicy);
-
-curatorClient.start();
-nodesPathPrefix = zkConfig.resolvePath("cluster/nodes");
-coordinatorPath = nodesPathPrefix + "/coordinator";
+this.clusterCoordinator = clusterCoordinator;
+this.electionManager = electionManager;
 }
 
 @Override
 public String getHeartbeatAddress() throws IOException {
-final String curAddress = coordinatorAddress;
-if (curAddress != null) {
-return curAddress;
+final String heartbeatAddress = 
electionManager.getLeader(ClusterRoles.CLUSTER_COORDINATOR);
+if (heartbeatAddress == null) {
+throw new ProtocolException("Cannot send heartbeat because 
there is no Cluster Coordinator currently elected");
 }
 
-try {
-// Get coordinator address and add watcher to change who we 
are heartbeating to if the value changes.
-final byte[] coordinatorAddressBytes = 
curatorClient.getData().usingWatcher(new Watcher() {
-@Override
-public void process(final WatchedEvent event) {
-coordinatorAddress = null;
-}
  

[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator

2016-08-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421632#comment-15421632
 ] 

ASF GitHub Bot commented on NIFI-2566:
--

GitHub user markap14 opened a pull request:

https://github.com/apache/nifi/pull/866

NIFI-2566: Refactoring to improve robustness of cluster



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/markap14/nifi NIFI-2566

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/866.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #866


commit 90dc74627157296b4084ed65c1b752bfe25db1e8
Author: Mark Payne 
Date:   2016-08-13T23:38:07Z

NIFI-2566: Refactored to allow just the Leader Election Manager to be 
responsible for determining who is the Cluster Coordinator

commit 3fd121b2ff57c5fc51171791351580c29c58e553
Author: Mark Payne 
Date:   2016-08-14T23:40:31Z

NIFI-2566: Removed storage of cluster roles from heartbeats and 
NodeConnectionStatus; use LeaderElectionManager to determine roles instead

commit 457a2df5c3296013ce896ca7900807d9bdb69a71
Author: Mark Payne 
Date:   2016-08-15T20:35:56Z

NIFI-2566: Updated Heartbeats so that if a node is out-of-sync with cluster 
topology, cluster coordinator will provide updated information back to the nodes




> Clustered Nodes can become out of sync regarding which node is coordinator
> --
>
> Key: NIFI-2566
> URL: https://issues.apache.org/jira/browse/NIFI-2566
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
> Fix For: 1.0.0
>
>
> Occasionally, I will see the UI telling me that no Cluster Coordinator has 
> been elected. However, I can see in the logs that the node is sending 
> heartbeats to the coordinator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)