[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator
[ https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423802#comment-15423802 ] ASF GitHub Bot commented on NIFI-2566: -- Github user asfgit closed the pull request at: https://github.com/apache/nifi/pull/866 > Clustered Nodes can become out of sync regarding which node is coordinator > -- > > Key: NIFI-2566 > URL: https://issues.apache.org/jira/browse/NIFI-2566 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Reporter: Mark Payne >Assignee: Mark Payne > Fix For: 1.0.0 > > > Occasionally, I will see the UI telling me that no Cluster Coordinator has > been elected. However, I can see in the logs that the node is sending > heartbeats to the coordinator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator
[ https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423798#comment-15423798 ] ASF subversion and git services commented on NIFI-2566: --- Commit e42ea9ad457c5b5a1d2da5fa3d3494aacb0cb8d4 in nifi's branch refs/heads/master from [~markap14] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=e42ea9a ] NIFI-2566: Refactored to allow just the Leader Election Manager to be responsible for determining who is the Cluster Coordinator NIFI-2566: Removed storage of cluster roles from heartbeats and NodeConnectionStatus; use LeaderElectionManager to determine roles instead NIFI-2566: Updated Heartbeats so that if a node is out-of-sync with cluster topology, cluster coordinator will provide updated information back to the nodes NIFI-2566: Fixed issue that prevented standalone instance from starting by creating a standalone-instance version of the Leader Election Manager. Also added Controller Service enabled/disabled state to fingerprint rather than attempting to update the state when joining the cluster, as the implementation was incorrect and the correct implementation will be a rather significant effort that doesn't have to happen for 1.0.0 release This closes #866 Signed-off-by: jpercivall> Clustered Nodes can become out of sync regarding which node is coordinator > -- > > Key: NIFI-2566 > URL: https://issues.apache.org/jira/browse/NIFI-2566 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Reporter: Mark Payne >Assignee: Mark Payne > Fix For: 1.0.0 > > > Occasionally, I will see the UI telling me that no Cluster Coordinator has > been elected. However, I can see in the logs that the node is sending > heartbeats to the coordinator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator
[ https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423800#comment-15423800 ] ASF subversion and git services commented on NIFI-2566: --- Commit e42ea9ad457c5b5a1d2da5fa3d3494aacb0cb8d4 in nifi's branch refs/heads/master from [~markap14] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=e42ea9a ] NIFI-2566: Refactored to allow just the Leader Election Manager to be responsible for determining who is the Cluster Coordinator NIFI-2566: Removed storage of cluster roles from heartbeats and NodeConnectionStatus; use LeaderElectionManager to determine roles instead NIFI-2566: Updated Heartbeats so that if a node is out-of-sync with cluster topology, cluster coordinator will provide updated information back to the nodes NIFI-2566: Fixed issue that prevented standalone instance from starting by creating a standalone-instance version of the Leader Election Manager. Also added Controller Service enabled/disabled state to fingerprint rather than attempting to update the state when joining the cluster, as the implementation was incorrect and the correct implementation will be a rather significant effort that doesn't have to happen for 1.0.0 release This closes #866 Signed-off-by: jpercivall> Clustered Nodes can become out of sync regarding which node is coordinator > -- > > Key: NIFI-2566 > URL: https://issues.apache.org/jira/browse/NIFI-2566 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Reporter: Mark Payne >Assignee: Mark Payne > Fix For: 1.0.0 > > > Occasionally, I will see the UI telling me that no Cluster Coordinator has > been elected. However, I can see in the logs that the node is sending > heartbeats to the coordinator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator
[ https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423799#comment-15423799 ] ASF subversion and git services commented on NIFI-2566: --- Commit e42ea9ad457c5b5a1d2da5fa3d3494aacb0cb8d4 in nifi's branch refs/heads/master from [~markap14] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=e42ea9a ] NIFI-2566: Refactored to allow just the Leader Election Manager to be responsible for determining who is the Cluster Coordinator NIFI-2566: Removed storage of cluster roles from heartbeats and NodeConnectionStatus; use LeaderElectionManager to determine roles instead NIFI-2566: Updated Heartbeats so that if a node is out-of-sync with cluster topology, cluster coordinator will provide updated information back to the nodes NIFI-2566: Fixed issue that prevented standalone instance from starting by creating a standalone-instance version of the Leader Election Manager. Also added Controller Service enabled/disabled state to fingerprint rather than attempting to update the state when joining the cluster, as the implementation was incorrect and the correct implementation will be a rather significant effort that doesn't have to happen for 1.0.0 release This closes #866 Signed-off-by: jpercivall> Clustered Nodes can become out of sync regarding which node is coordinator > -- > > Key: NIFI-2566 > URL: https://issues.apache.org/jira/browse/NIFI-2566 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Reporter: Mark Payne >Assignee: Mark Payne > Fix For: 1.0.0 > > > Occasionally, I will see the UI telling me that no Cluster Coordinator has > been elected. However, I can see in the logs that the node is sending > heartbeats to the coordinator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator
[ https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423801#comment-15423801 ] ASF subversion and git services commented on NIFI-2566: --- Commit e42ea9ad457c5b5a1d2da5fa3d3494aacb0cb8d4 in nifi's branch refs/heads/master from [~markap14] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=e42ea9a ] NIFI-2566: Refactored to allow just the Leader Election Manager to be responsible for determining who is the Cluster Coordinator NIFI-2566: Removed storage of cluster roles from heartbeats and NodeConnectionStatus; use LeaderElectionManager to determine roles instead NIFI-2566: Updated Heartbeats so that if a node is out-of-sync with cluster topology, cluster coordinator will provide updated information back to the nodes NIFI-2566: Fixed issue that prevented standalone instance from starting by creating a standalone-instance version of the Leader Election Manager. Also added Controller Service enabled/disabled state to fingerprint rather than attempting to update the state when joining the cluster, as the implementation was incorrect and the correct implementation will be a rather significant effort that doesn't have to happen for 1.0.0 release This closes #866 Signed-off-by: jpercivall> Clustered Nodes can become out of sync regarding which node is coordinator > -- > > Key: NIFI-2566 > URL: https://issues.apache.org/jira/browse/NIFI-2566 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Reporter: Mark Payne >Assignee: Mark Payne > Fix For: 1.0.0 > > > Occasionally, I will see the UI telling me that no Cluster Coordinator has > been elected. However, I can see in the logs that the node is sending > heartbeats to the coordinator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator
[ https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422849#comment-15422849 ] ASF GitHub Bot commented on NIFI-2566: -- Github user markap14 commented on the issue: https://github.com/apache/nifi/pull/866 @JPercivall I pushed an update that should address this issue. > Clustered Nodes can become out of sync regarding which node is coordinator > -- > > Key: NIFI-2566 > URL: https://issues.apache.org/jira/browse/NIFI-2566 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Reporter: Mark Payne >Assignee: Mark Payne > Fix For: 1.0.0 > > > Occasionally, I will see the UI telling me that no Cluster Coordinator has > been elected. However, I can see in the logs that the node is sending > heartbeats to the coordinator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator
[ https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421911#comment-15421911 ] ASF GitHub Bot commented on NIFI-2566: -- Github user JPercivall commented on the issue: https://github.com/apache/nifi/pull/866 @markap14, I'm having trouble running a standalone instance with this PR. I get the at I can't start the FlowController due to (very long message, took the end): `Caused by: java.lang.IllegalStateException: The 'nifi.zookeeper.connect.string' property is not set in nifi.properties at org.apache.nifi.controller.cluster.ZooKeeperClientConfig.createConfig(ZooKeeperClientConfig.java:76) ~[nifi-framework-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT] at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.(CuratorLeaderElectionManager.java:61) ~[nifi-framework-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_74] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:1.8.0_74] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_74] at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_74] at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:147) ~[spring-beans-4.2.4.RELEASE.jar:4.2.4.RELEASE] ... 52 common frames omitted` > Clustered Nodes can become out of sync regarding which node is coordinator > -- > > Key: NIFI-2566 > URL: https://issues.apache.org/jira/browse/NIFI-2566 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Reporter: Mark Payne >Assignee: Mark Payne > Fix For: 1.0.0 > > > Occasionally, I will see the UI telling me that no Cluster Coordinator has > been elected. However, I can see in the logs that the node is sending > heartbeats to the coordinator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator
[ https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421657#comment-15421657 ] ASF GitHub Bot commented on NIFI-2566: -- Github user JPercivall commented on a diff in the pull request: https://github.com/apache/nifi/pull/866#discussion_r74833401 --- Diff: nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/cluster/ClusterProtocolHeartbeater.java --- @@ -18,102 +18,80 @@ package org.apache.nifi.controller.cluster; import java.io.IOException; -import java.nio.charset.StandardCharsets; -import java.util.Properties; - -import org.apache.curator.RetryPolicy; -import org.apache.curator.framework.CuratorFramework; -import org.apache.curator.framework.CuratorFrameworkFactory; -import org.apache.curator.retry.RetryNTimes; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; + +import org.apache.nifi.cluster.coordination.ClusterCoordinator; +import org.apache.nifi.cluster.coordination.node.ClusterRoles; +import org.apache.nifi.cluster.coordination.node.NodeConnectionStatus; +import org.apache.nifi.cluster.protocol.HeartbeatPayload; +import org.apache.nifi.cluster.protocol.NodeIdentifier; import org.apache.nifi.cluster.protocol.NodeProtocolSender; import org.apache.nifi.cluster.protocol.ProtocolException; import org.apache.nifi.cluster.protocol.message.HeartbeatMessage; -import org.apache.zookeeper.WatchedEvent; -import org.apache.zookeeper.Watcher; +import org.apache.nifi.cluster.protocol.message.HeartbeatResponseMessage; +import org.apache.nifi.controller.leader.election.LeaderElectionManager; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** - * Uses ZooKeeper in order to determine which node is the elected Cluster Coordinator and to indicate - * that this node is part of the cluster. However, once the Cluster Coordinator is known, heartbeats are + * Uses Leader Election Manager in order to determine which node is the elected Cluster Coordinator and to indicate + * that this node is part of the cluster. Once the Cluster Coordinator is known, heartbeats are * sent directly to the Cluster Coordinator. */ public class ClusterProtocolHeartbeater implements Heartbeater { private static final Logger logger = LoggerFactory.getLogger(ClusterProtocolHeartbeater.class); private final NodeProtocolSender protocolSender; -private final CuratorFramework curatorClient; -private final String nodesPathPrefix; - -private final String coordinatorPath; -private volatile String coordinatorAddress; +private final LeaderElectionManager electionManager; +private final ClusterCoordinator clusterCoordinator; - -public ClusterProtocolHeartbeater(final NodeProtocolSender protocolSender, final Properties properties) { +public ClusterProtocolHeartbeater(final NodeProtocolSender protocolSender, final ClusterCoordinator clusterCoordinator, final LeaderElectionManager electionManager) { this.protocolSender = protocolSender; - -final RetryPolicy retryPolicy = new RetryNTimes(10, 500); -final ZooKeeperClientConfig zkConfig = ZooKeeperClientConfig.createConfig(properties); - -curatorClient = CuratorFrameworkFactory.newClient(zkConfig.getConnectString(), -zkConfig.getSessionTimeoutMillis(), zkConfig.getConnectionTimeoutMillis(), retryPolicy); - -curatorClient.start(); -nodesPathPrefix = zkConfig.resolvePath("cluster/nodes"); -coordinatorPath = nodesPathPrefix + "/coordinator"; +this.clusterCoordinator = clusterCoordinator; +this.electionManager = electionManager; } @Override public String getHeartbeatAddress() throws IOException { -final String curAddress = coordinatorAddress; -if (curAddress != null) { -return curAddress; +final String heartbeatAddress = electionManager.getLeader(ClusterRoles.CLUSTER_COORDINATOR); +if (heartbeatAddress == null) { +throw new ProtocolException("Cannot send heartbeat because there is no Cluster Coordinator currently elected"); } -try { -// Get coordinator address and add watcher to change who we are heartbeating to if the value changes. -final byte[] coordinatorAddressBytes = curatorClient.getData().usingWatcher(new Watcher() { -@Override -public void process(final WatchedEvent event) { -coordinatorAddress = null; -}
[jira] [Commented] (NIFI-2566) Clustered Nodes can become out of sync regarding which node is coordinator
[ https://issues.apache.org/jira/browse/NIFI-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421632#comment-15421632 ] ASF GitHub Bot commented on NIFI-2566: -- GitHub user markap14 opened a pull request: https://github.com/apache/nifi/pull/866 NIFI-2566: Refactoring to improve robustness of cluster You can merge this pull request into a Git repository by running: $ git pull https://github.com/markap14/nifi NIFI-2566 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/866.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #866 commit 90dc74627157296b4084ed65c1b752bfe25db1e8 Author: Mark PayneDate: 2016-08-13T23:38:07Z NIFI-2566: Refactored to allow just the Leader Election Manager to be responsible for determining who is the Cluster Coordinator commit 3fd121b2ff57c5fc51171791351580c29c58e553 Author: Mark Payne Date: 2016-08-14T23:40:31Z NIFI-2566: Removed storage of cluster roles from heartbeats and NodeConnectionStatus; use LeaderElectionManager to determine roles instead commit 457a2df5c3296013ce896ca7900807d9bdb69a71 Author: Mark Payne Date: 2016-08-15T20:35:56Z NIFI-2566: Updated Heartbeats so that if a node is out-of-sync with cluster topology, cluster coordinator will provide updated information back to the nodes > Clustered Nodes can become out of sync regarding which node is coordinator > -- > > Key: NIFI-2566 > URL: https://issues.apache.org/jira/browse/NIFI-2566 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Reporter: Mark Payne >Assignee: Mark Payne > Fix For: 1.0.0 > > > Occasionally, I will see the UI telling me that no Cluster Coordinator has > been elected. However, I can see in the logs that the node is sending > heartbeats to the coordinator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)