[jira] [Commented] (NIFI-12232) Frequent "failed to connect node to cluster because local flow controller partially updated. Administrator should disconnect node and review flow for corruption"

2024-02-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818999#comment-17818999
 ] 

ASF subversion and git services commented on NIFI-12232:


Commit 9ca8bfc7520ceba9f090de1db8c117d8b860184e in nifi's branch 
refs/heads/support/nifi-1.x from Mark Payne
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=9ca8bfc752 ]

NIFI-12232 Corrected Group Component ID Handling for Clustered Flows

Ensured that if a Process Group doesn't have a Versioned Component ID we use 
the ComponentIdLookup to create one based on its Instance ID in the same way 
that is done when serializing the flow; this ensures matching ID's when we 
synchronize flows across the cluster. Also included some code cleanup around 
failure handling on startup

This closes #8432

Signed-off-by: David Handermann 


> Frequent "failed to connect node to cluster because local flow controller 
> partially updated. Administrator should disconnect node and review flow for 
> corruption"
> -
>
> Key: NIFI-12232
> URL: https://issues.apache.org/jira/browse/NIFI-12232
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Configuration Management
>Affects Versions: 1.23.2
>Reporter: John Joseph
>Assignee: Mark Payne
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: image-2023-10-16-16-12-31-027.png, 
> image-2024-02-14-13-33-44-354.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is an issue that we have been observing in the 1.23.2 version of NiFi 
> when we try upgrade,
> Since Rolling upgrade is not supported in NiFi, we scale out the revision 
> that is running and {_}run a helm upgrade{_}.
> We have NIFI running in k8s cluster mode, there is a post job that call the 
> Tenants and policies API. On a successful run it would run like this
> {code:java}
> set_policies() Action: 'read' Resource: '/flow' entity_id: 
> 'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
> entity_type: 'USER'
> set_policies() status: '200'
> 'read' '/flow' policy already exists. It will be updated...
> set_policies() fetching policy inside -eq 200 status: '200'
> set_policies() after update PUT: '200'
> set_policies() Action: 'read' Resource: '/tenants' entity_id: 
> 'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
> entity_type: 'USER'
> set_policies() status: '200'{code}
> *_This job was running fine in 1.23.0, 1.22 and other previous versions._* In 
> {*}{{1.23.2}}{*}, we are noticing that the job is failing very frequently 
> with the error logs;
> {code:java}
> set_policies() Action: 'read' Resource: '/flow' entity_id: 
> 'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
> entity_type: 'USER'
> set_policies() status: '200'
> 'read' '/flow' policy already exists. It will be updated...
> set_policies() fetching policy inside -eq 200 status: '200'
> set_policies() after update PUT: '400'
> An error occurred getting 'read' '/flow' policy: 'This node is disconnected 
> from its configured cluster. The requested change will only be allowed if the 
> flag to acknowledge the disconnected node is set.'{code}
> {{_*'This node is disconnected from its configured cluster. The requested 
> change will only be allowed if the flag to acknowledge the disconnected node 
> is set.'*_}}
> The job is configured to run only after all the pods are up and running. 
> Though the pods are up we see exception is the inside pods
> {code:java}
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1059)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:667)
> at 
> org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:107)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:396)
> at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because the current destination is running
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:448)
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.sync(VersionedFlowSynchronizer.java:206)
> at 
> 

[jira] [Commented] (NIFI-12232) Frequent "failed to connect node to cluster because local flow controller partially updated. Administrator should disconnect node and review flow for corruption"

2024-02-20 Thread David Handermann (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818922#comment-17818922
 ] 

David Handermann commented on NIFI-12232:
-

Merged initial PR with corrections to the main branch for NiFi 2.0.0. Based on 
other substantive changes to leader election components, a separate PR will be 
necessary for the support branch.

> Frequent "failed to connect node to cluster because local flow controller 
> partially updated. Administrator should disconnect node and review flow for 
> corruption"
> -
>
> Key: NIFI-12232
> URL: https://issues.apache.org/jira/browse/NIFI-12232
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Configuration Management
>Affects Versions: 1.23.2
>Reporter: John Joseph
>Assignee: Mark Payne
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: image-2023-10-16-16-12-31-027.png, 
> image-2024-02-14-13-33-44-354.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is an issue that we have been observing in the 1.23.2 version of NiFi 
> when we try upgrade,
> Since Rolling upgrade is not supported in NiFi, we scale out the revision 
> that is running and {_}run a helm upgrade{_}.
> We have NIFI running in k8s cluster mode, there is a post job that call the 
> Tenants and policies API. On a successful run it would run like this
> {code:java}
> set_policies() Action: 'read' Resource: '/flow' entity_id: 
> 'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
> entity_type: 'USER'
> set_policies() status: '200'
> 'read' '/flow' policy already exists. It will be updated...
> set_policies() fetching policy inside -eq 200 status: '200'
> set_policies() after update PUT: '200'
> set_policies() Action: 'read' Resource: '/tenants' entity_id: 
> 'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
> entity_type: 'USER'
> set_policies() status: '200'{code}
> *_This job was running fine in 1.23.0, 1.22 and other previous versions._* In 
> {*}{{1.23.2}}{*}, we are noticing that the job is failing very frequently 
> with the error logs;
> {code:java}
> set_policies() Action: 'read' Resource: '/flow' entity_id: 
> 'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
> entity_type: 'USER'
> set_policies() status: '200'
> 'read' '/flow' policy already exists. It will be updated...
> set_policies() fetching policy inside -eq 200 status: '200'
> set_policies() after update PUT: '400'
> An error occurred getting 'read' '/flow' policy: 'This node is disconnected 
> from its configured cluster. The requested change will only be allowed if the 
> flag to acknowledge the disconnected node is set.'{code}
> {{_*'This node is disconnected from its configured cluster. The requested 
> change will only be allowed if the flag to acknowledge the disconnected node 
> is set.'*_}}
> The job is configured to run only after all the pods are up and running. 
> Though the pods are up we see exception is the inside pods
> {code:java}
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1059)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:667)
> at 
> org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:107)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:396)
> at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because the current destination is running
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:448)
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.sync(VersionedFlowSynchronizer.java:206)
> at 
> org.apache.nifi.controller.serialization.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:42)
> at 
> org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1530)
> at 
> org.apache.nifi.persistence.StandardFlowConfigurationDAO.load(StandardFlowConfigurationDAO.java:104)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:817)
> at 
> 

[jira] [Commented] (NIFI-12232) Frequent "failed to connect node to cluster because local flow controller partially updated. Administrator should disconnect node and review flow for corruption"

2024-02-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818917#comment-17818917
 ] 

ASF subversion and git services commented on NIFI-12232:


Commit a821966a870a494bd0a7eb0ae15822e06b8d21c5 in nifi's branch 
refs/heads/main from Mark Payne
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=a821966a87 ]

NIFI-12232 Corrected Group Component ID Handling for Clustered Flows

Ensured that if a Process Group doesn't have a Versioned Component ID we use 
the ComponentIdLookup to create one based on its Instance ID in the same way 
that is done when serializing the flow; this ensures matching ID's when we 
synchronize flows across the cluster. Also included some code cleanup around 
failure handling on startup

This closes #8406

Signed-off-by: David Handermann 


> Frequent "failed to connect node to cluster because local flow controller 
> partially updated. Administrator should disconnect node and review flow for 
> corruption"
> -
>
> Key: NIFI-12232
> URL: https://issues.apache.org/jira/browse/NIFI-12232
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Configuration Management
>Affects Versions: 1.23.2
>Reporter: John Joseph
>Assignee: Mark Payne
>Priority: Major
> Attachments: image-2023-10-16-16-12-31-027.png, 
> image-2024-02-14-13-33-44-354.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is an issue that we have been observing in the 1.23.2 version of NiFi 
> when we try upgrade,
> Since Rolling upgrade is not supported in NiFi, we scale out the revision 
> that is running and {_}run a helm upgrade{_}.
> We have NIFI running in k8s cluster mode, there is a post job that call the 
> Tenants and policies API. On a successful run it would run like this
> {code:java}
> set_policies() Action: 'read' Resource: '/flow' entity_id: 
> 'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
> entity_type: 'USER'
> set_policies() status: '200'
> 'read' '/flow' policy already exists. It will be updated...
> set_policies() fetching policy inside -eq 200 status: '200'
> set_policies() after update PUT: '200'
> set_policies() Action: 'read' Resource: '/tenants' entity_id: 
> 'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
> entity_type: 'USER'
> set_policies() status: '200'{code}
> *_This job was running fine in 1.23.0, 1.22 and other previous versions._* In 
> {*}{{1.23.2}}{*}, we are noticing that the job is failing very frequently 
> with the error logs;
> {code:java}
> set_policies() Action: 'read' Resource: '/flow' entity_id: 
> 'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
> entity_type: 'USER'
> set_policies() status: '200'
> 'read' '/flow' policy already exists. It will be updated...
> set_policies() fetching policy inside -eq 200 status: '200'
> set_policies() after update PUT: '400'
> An error occurred getting 'read' '/flow' policy: 'This node is disconnected 
> from its configured cluster. The requested change will only be allowed if the 
> flag to acknowledge the disconnected node is set.'{code}
> {{_*'This node is disconnected from its configured cluster. The requested 
> change will only be allowed if the flag to acknowledge the disconnected node 
> is set.'*_}}
> The job is configured to run only after all the pods are up and running. 
> Though the pods are up we see exception is the inside pods
> {code:java}
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1059)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:667)
> at 
> org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:107)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:396)
> at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because the current destination is running
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:448)
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.sync(VersionedFlowSynchronizer.java:206)
> at 
> 

[jira] [Commented] (NIFI-12232) Frequent "failed to connect node to cluster because local flow controller partially updated. Administrator should disconnect node and review flow for corruption"

2024-02-16 Thread Joe Witt (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818091#comment-17818091
 ] 

Joe Witt commented on NIFI-12232:
-

Also hit by

https://apachenifi.slack.com/archives/C0L9VCD47/p1708113098305609

Roman Wesołowski
  29 minutes ago
Hi all,
I have 3 nodes Nifi cluster with 2.0.0-M1 version. Till today everything was 
working correctly, during my development something strange happeded. For some 
reason 2 nodes disconnected from cluster, and I am not able to reconnect them 
to the cluster. I have resterted nodes but without successes. All machines are 
up but can not connect each other.  Any help would be appreciated.
2024-02-16 15:14:36,663 ERROR [Reconnect to Cluster] 
o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.120.8.252:8080 -- 
Node disconnected from cluster due to 
org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
to connect node to cluster because local flow controller partially updated. 
Administrator should disconnect node andreview flow for corruption.

> Frequent "failed to connect node to cluster because local flow controller 
> partially updated. Administrator should disconnect node and review flow for 
> corruption"
> -
>
> Key: NIFI-12232
> URL: https://issues.apache.org/jira/browse/NIFI-12232
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Configuration Management
>Affects Versions: 1.23.2
>Reporter: John Joseph
>Assignee: Mark Payne
>Priority: Major
> Attachments: image-2023-10-16-16-12-31-027.png, 
> image-2024-02-14-13-33-44-354.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is an issue that we have been observing in the 1.23.2 version of NiFi 
> when we try upgrade,
> Since Rolling upgrade is not supported in NiFi, we scale out the revision 
> that is running and {_}run a helm upgrade{_}.
> We have NIFI running in k8s cluster mode, there is a post job that call the 
> Tenants and policies API. On a successful run it would run like this
> {code:java}
> set_policies() Action: 'read' Resource: '/flow' entity_id: 
> 'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
> entity_type: 'USER'
> set_policies() status: '200'
> 'read' '/flow' policy already exists. It will be updated...
> set_policies() fetching policy inside -eq 200 status: '200'
> set_policies() after update PUT: '200'
> set_policies() Action: 'read' Resource: '/tenants' entity_id: 
> 'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
> entity_type: 'USER'
> set_policies() status: '200'{code}
> *_This job was running fine in 1.23.0, 1.22 and other previous versions._* In 
> {*}{{1.23.2}}{*}, we are noticing that the job is failing very frequently 
> with the error logs;
> {code:java}
> set_policies() Action: 'read' Resource: '/flow' entity_id: 
> 'ad2d3ad6-5d69-3e0f-95e9-c7feb36e2de5' entity_name: 'CN=nifi-api-admin' 
> entity_type: 'USER'
> set_policies() status: '200'
> 'read' '/flow' policy already exists. It will be updated...
> set_policies() fetching policy inside -eq 200 status: '200'
> set_policies() after update PUT: '400'
> An error occurred getting 'read' '/flow' policy: 'This node is disconnected 
> from its configured cluster. The requested change will only be allowed if the 
> flag to acknowledge the disconnected node is set.'{code}
> {{_*'This node is disconnected from its configured cluster. The requested 
> change will only be allowed if the flag to acknowledge the disconnected node 
> is set.'*_}}
> The job is configured to run only after all the pods are up and running. 
> Though the pods are up we see exception is the inside pods
> {code:java}
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1059)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:667)
> at 
> org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:107)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:396)
> at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because the current destination is running
> at 
> 

[jira] [Commented] (NIFI-12232) Frequent "failed to connect node to cluster because local flow controller partially updated. Administrator should disconnect node and review flow for corruption"

2024-02-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/NIFI-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817377#comment-17817377
 ] 

René Zeidler commented on NIFI-12232:
-

I've encountered the same issue. It's happened since at least 1.23.2, and I can 
realiably reproduce it on 1.25.0 and 2.0.0-M2 as well.

I've been able to create minimal reproduction steps that do not require and 
non-standard setup. The issue is independent of any specific processors or any 
complicated flow setup. It _always_ occurs when a node disconnects from the 
cluster which contains a process group that hasn't been "fully synced". I'll 
explain what that means in the reproduction steps.
h2. Minimal Reproduction Steps
 # Setup a NiFi {*}cluster with at least 3 nodes{*}, using all default settings.
You may adjust {{nifi.cluster.flow.election.max.wait.time}} and 
{{nifi.cluster.flow.election.max.candidates}} to make the node connection 
process faster, but this isn't necessary to reproduce the bug.
 # I'll call the nodes {*}Node A{*}, {*}Node B{*}, and {*}Node C{*}.
Open the web interface for Node A and Node B.
 # On {*}Node A{*}, create a new {*}process group{*}. In that process group, 
create a very simple flow: GenerateFlowFile going into UpdateAttribute going 
into a funnel. Start the UpdateAttribute processor. Like this:
!image-2024-02-14-13-33-44-354.png!
The exact flow doesn't matter, all that's necessary to produce the bug is a 
*running processor* with an {*}ingoing and outgoing connection{*}.
 # On {*}Node B{*}, observe that the process group has _automatically synced_ 
(Right click -> Refresh if you don't want to wait).
 # On {*}Node A{*}, go to *Menu -> Cluster* (top right hamburger menu). 
{*}Disconnect Node B{*}. Click refresh (bottom left) until the node has 
disconnected.
 # Right after it was disconnected, *connect Node B* again. Click refresh to 
see the status change. It will change to CONNECTING and quickly back to 
DISCONNECTED. Check the log file for Node B. You will see the following 
exception:
{{o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
due to: org.apache.nifi.controller.serialization.FlowSynchronizationException: 
Failed to connect node to cluster because local flow controller partially 
updated. Administrator should disconnect node and review flow for corruption.}}
[...]
{{Caused by: 
org.apache.nifi.controller.serialization.FlowSynchronizationException: 
java.lang.IllegalStateException: Cannot change destination of Connection 
because the current destination is running}}
 # On {*}Node B{*}, you will get the warning that the node is disconnected from 
the cluster ({_}This node is currently not connected to the cluster. Any 
modifications to the data flow made here will not replicate across the 
cluster.{_})
Go into the process group. Observe that the UpdateAttribute processor is 
{*}running{*}, which is the direct cause of the exception.

h3. Temporary fix
 # On {*}Node B{*}, *stop* the UpdateAttribute processor.
 # On {*}Node A{*}, *connect Node B* again. This time it will work and Node B 
successfully reconnects to the cluster.
 # However, this only allows Node B to reconnect once. The process group on 
Node B is still in an inconsistent state and will fail to reconnect the next 
time. Repeat steps 5 - 7 above to confirm that the issue persists.

h3. Permanent fix
 # On {*}Node B{*}, stop the UpdateAttribute processor and then {*}delete the 
whole processor group{*}. Since Node B is currently disconnected from the 
cluster, this will only delete the process group locally on this node.
 # On {*}Node A{*}, *connect Node B* again. The reconnection will be 
successfull and the deleted process group will sync back to Node B. Since the 
whole process group was missing, this will now be a "full sync".
 # This specific process group on this specific node (Node B) is now "fixed". 
It will not cause this issue anymore.
To confirm, repeat steps 5 and 6 above. You can disconnect and reconnect Node B 
without issues.

h2. Further notes
 * Instead of deleting the process group, you can also stop the disconnected 
node completely, delete the flow.json/flow.xml, and start it again. It will 
join the cluster again, and all process groups will be "fully synced". This fix 
was described in previous comments, but is not necessary to reproduce the issue.
 * The fix applies per process group and per node. After fixing the issue for 
Node B with the "permanent fix" above, it will still affect Node C. If you 
disconnect and try to reconnect Node C it will throw the same exception.
 * Also, the group where you initially created the flow (in this example Node 
A) is _not_ exempt. If you go to Node C, disconnect and try to reconnect Node 
A, it will throw the same exception.

h2. Full error log

 
{code:java}
2024-02-14 12:49:40,487 INFO [Reconnect to Cluster] 
o.a.nifi.controller.StandardFlowService Processing reconnection request 

[jira] [Commented] (NIFI-12232) Frequent "failed to connect node to cluster because local flow controller partially updated. Administrator should disconnect node and review flow for corruption"

2023-10-27 Thread Jonathan Johnson (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780464#comment-17780464
 ] 

Jonathan Johnson commented on NIFI-12232:
-

Hello,

I am also having a number of the "Failed to connect node to cluster..." errors 
in my system leaving nodes disconnected from the cluster until stopping NiFi 
and allowing the node to recreate its canvas from the cluster. I am trying to 
determine if we are experiencing same issues or if this should be its own 
ticket. 

My logs immediately before the "Failed to connect ..." errors look like:
{code:java}
2023-10-27 09:41:08,373 INFO [Heartbeat Monitor Thread-1] 
o.a.n.c.c.node.NodeClusterCoordinator Status of node1.nifi:9441 changed from 
NodeConnectionStatus[nodeId=node1.nifi:9441, state=DISCONNECTED, Disconnect 
Code=Lack of Heartbeat, Disconnect Reason=Have not received a heartbeat from 
node in 3 seconds, updateId=83] to NodeConnectionStatus[nodeId=node1.nifi:9441, 
state=CONNECTING, updateId=84]

2023-10-27 09:41:08,452 INFO [Reconnect to Cluster] 
o.a.nifi.controller.StandardFlowService Processing reconnection request from 
cluster coordinator.

2023-10-27 09:41:08,720 INFO [Reconnect to Cluster] 
o.a.nifi.groups.StandardProcessGroup 
Template[id=9cd572fa-2d18-3228-9a10-8f2438ae9459] removed from flow
2023-10-27 09:41:08,722 INFO [Reconnect to Cluster] 
o.a.n.f.s.StandardVersionedComponentSynchronizer No differences between current 
flow and proposed flow for 
StandardProcessGroup[identifier=4e8898ea-018b-1000-14a4-b336fae6590e,name=NiFi 
Flow]
2023-10-27 09:41:08,724 INFO [Reconnect to Cluster] 
o.a.nifi.groups.StandardProcessGroup 
StandardFunnel[id=71590595-018b-1000--94fe7f50-temp-funnel] added to 
StandardProcessGroup[identifier=71590595-018b-1000--94fe7f50,name=testetst]
2023-10-27 09:41:08,731 INFO [Reconnect to Cluster] 
o.a.n.c.s.AffectedComponentSet Starting the following components: 
AffectedComponentSet[inputPorts=[], outputPorts=[], remoteInputPorts=[], 
remoteOutputPorts=[], processors=[], parameterProviders=[], 
flowRegistryCliens=[], controllerServices=[], reportingTasks=[]]
2023-10-27 09:41:08,731 INFO [Reconnect to Cluster] 
o.a.nifi.controller.StandardFlowService Disconnecting node due to Failed to 
properly handle Reconnection request due to 
org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
to connect node to cluster because local flow controller partially updated. 
Administrator should disconnect node and review flow for corruption.
2023-10-27 09:41:08,731 INFO [Reconnect to Cluster] 
o.apache.nifi.controller.FlowController Will no longer send heartbeats
2023-10-27 09:41:08,731 INFO [Reconnect to Cluster] 
o.apache.nifi.controller.FlowController FlowController will stop sending 
heartbeats to Cluster Coordinator
2023-10-27 09:41:08,731 INFO [Reconnect to Cluster] 
o.apache.nifi.controller.FlowController Cluster State changed from Clustered to 
Not Clustered
2023-10-27 09:41:08,731 INFO [Reconnect to Cluster] 
o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election Role 
'Primary Node' because that role is not registered

2023-10-27 09:41:08,731 ERROR [Reconnect to Cluster] 
o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
due to: org.apache.nifi.controller.serialization.FlowSynchronizationException: 
Failed to connect node to cluster because local flow controller partially 
updated. Administrator should disconnect node and review flow for corruption.
org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
to connect node to cluster because local flow controller partially updated. 
Administrator should disconnect node and review flow for corruption.
    at 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1059)
    
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: 
org.apache.nifi.controller.serialization.FlowSynchronizationException: 
java.lang.IllegalStateException: Cannot change destination of Connection 
because the current destination is running
    at 
org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:448)
    
    ... 4 common frames omitted
Caused by: java.lang.IllegalStateException: Cannot change destination of 
Connection because the current destination is running
    at 
org.apache.nifi.connectable.StandardConnection.setDestination(StandardConnection.java:310)
    at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.updateConnectionDestinations(StandardVersionedComponentSynchronizer.java:700)
    at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronize(StandardVersionedComponentSynchronizer.java:405)
    
    ... 10 common frames omitted
2023-10-27 09:41:08,732 INFO [Reconnect to Cluster]