[jira] [Commented] (NIFI-12969) Under heavy load, nifi node unable to rejoin cluster, graph modified with temp funnel

2024-04-04 Thread Nissim Shiman (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834034#comment-17834034
 ] 

Nissim Shiman commented on NIFI-12969:
--

Thank you very much [~markap14] [~joewitt] [~pgyori] for looking at this and 
the resolution.

I did some testing and verified the issue no longer occurs.

> Under heavy load, nifi node unable to rejoin cluster, graph modified with 
> temp funnel
> -
>
> Key: NIFI-12969
> URL: https://issues.apache.org/jira/browse/NIFI-12969
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.24.0, 2.0.0-M2
>Reporter: Nissim Shiman
>Assignee: Mark Payne
>Priority: Critical
> Fix For: 2.0.0-M3, 1.26.0
>
> Attachments: nifi-app.log, simple_flow.png, 
> simple_flow_with_temp-funnel.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Under heavy load, if a node leaves the cluster (due to heartbeat time out), 
> many times it is unable to rejoin the cluster.
> The nodes' graph will have been modified with a temp-funnel as well.
> Appears to be some sort of [timing 
> issue|https://github.com/apache/nifi/blob/rel/nifi-2.0.0-M2/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/connectable/StandardConnection.java#L298]
>  # To reproduce, on a nifi cluster of three nodes, set up:
> 2 GenerateFlowFile processors -> PG
> Inside PG:
> inputPort -> UpdateAttribute
>  # Keep all defaults except for the following:
> For UpdateAttribute terminate the success relationship
> One of the GenerateFlowFile processors can be disabled,
> the other one should have Run Schedule to be 0 min (this will allow for the 
> heavy load)
>  # In nifi.properties (on all 3 nodes) to allow for nodes to fall out of the 
> cluster, set: nifi.cluster.protocol.heartbeat.interval=2 sec  (default is 5) 
> nifi.cluster.protocol.heartbeat.missable.max=1   (default is 8)
> Restart nifi. Start flow. The nodes will quickly fall out and rejoin cluster. 
> After a few minutes one will likely not be able to rejoin.  The graph for 
> that node will have the disabled GenerateFlowFile now pointing to a funnel (a 
> temp-funnel) instead of the PG
> Stack trace on that nodes nifi-app.log will look like this: (this is from 
> 2.0.0-M2):
> {code:java}
> 2024-03-28 13:55:19,395 INFO [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Node disconnected due to Failed to 
> properly handle Reconnection request due to org.apache.nifi.control
> ler.serialization.FlowSynchronizationException: Failed to connect node to 
> cluster because local flow controller partially updated. Administrator should 
> disconnect node and review flow for corrup
> tion.
> 2024-03-28 13:55:19,395 ERROR [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
> due to: org.apache.nifi.controller.serialization.FlowSynchroniza
> tionException: Failed to connect node to cluster because local flow 
> controller partially updated. Administrator should disconnect node and review 
> flow for corruption.
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and
>  review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:985)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:655)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:384)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because FlowFiles from this Connection
> are currently held by LocalPort[id=99213c00-78ca-4848-112f-5454cc20656b, 
> type=INPUT_PORT, name=inputPort, group=innerPG]
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:472)
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.sync(VersionedFlowSynchronizer.java:223)
> at 
> org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1740)
> at 
> org.apache.nifi.persistence.StandardFlowConfigurationDAO.load(StandardFlowConfigurationDAO.java:91)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:805)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:954)
> ... 3 

[jira] [Commented] (NIFI-12969) Under heavy load, nifi node unable to rejoin cluster, graph modified with temp funnel

2024-04-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833780#comment-17833780
 ] 

ASF subversion and git services commented on NIFI-12969:


Commit 212763dd79b3652c9870ff6a59db28c232113c55 in nifi's branch 
refs/heads/support/nifi-1.x from Mark Payne
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=212763dd79 ]

NIFI-12969: Fixed a typo in the #isTempDestinationNecessary method, where we 
were comparing existingConnection.getProcessGroup() to 
newDestination.getProcessGroup(), instead of comparing 
existingConnection.getDestination().getProcessGroup() to 
newDestination.getProcessGroup(). I.e., we were comparing the Destination's PG 
to the Connection's PG instead of comparing Destination's PG to Destination's 
PG. This resulted in using a temporary funnel when we shouldn't. And as a 
result it attempted to change a Connection's Destination while the destination 
was running.

Signed-off-by: Joseph Witt 


> Under heavy load, nifi node unable to rejoin cluster, graph modified with 
> temp funnel
> -
>
> Key: NIFI-12969
> URL: https://issues.apache.org/jira/browse/NIFI-12969
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.24.0, 2.0.0-M2
>Reporter: Nissim Shiman
>Assignee: Mark Payne
>Priority: Critical
> Fix For: 2.0.0-M3, 1.26.0
>
> Attachments: nifi-app.log, simple_flow.png, 
> simple_flow_with_temp-funnel.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Under heavy load, if a node leaves the cluster (due to heartbeat time out), 
> many times it is unable to rejoin the cluster.
> The nodes' graph will have been modified with a temp-funnel as well.
> Appears to be some sort of [timing 
> issue|https://github.com/apache/nifi/blob/rel/nifi-2.0.0-M2/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/connectable/StandardConnection.java#L298]
>  # To reproduce, on a nifi cluster of three nodes, set up:
> 2 GenerateFlowFile processors -> PG
> Inside PG:
> inputPort -> UpdateAttribute
>  # Keep all defaults except for the following:
> For UpdateAttribute terminate the success relationship
> One of the GenerateFlowFile processors can be disabled,
> the other one should have Run Schedule to be 0 min (this will allow for the 
> heavy load)
>  # In nifi.properties (on all 3 nodes) to allow for nodes to fall out of the 
> cluster, set: nifi.cluster.protocol.heartbeat.interval=2 sec  (default is 5) 
> nifi.cluster.protocol.heartbeat.missable.max=1   (default is 8)
> Restart nifi. Start flow. The nodes will quickly fall out and rejoin cluster. 
> After a few minutes one will likely not be able to rejoin.  The graph for 
> that node will have the disabled GenerateFlowFile now pointing to a funnel (a 
> temp-funnel) instead of the PG
> Stack trace on that nodes nifi-app.log will look like this: (this is from 
> 2.0.0-M2):
> {code:java}
> 2024-03-28 13:55:19,395 INFO [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Node disconnected due to Failed to 
> properly handle Reconnection request due to org.apache.nifi.control
> ler.serialization.FlowSynchronizationException: Failed to connect node to 
> cluster because local flow controller partially updated. Administrator should 
> disconnect node and review flow for corrup
> tion.
> 2024-03-28 13:55:19,395 ERROR [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
> due to: org.apache.nifi.controller.serialization.FlowSynchroniza
> tionException: Failed to connect node to cluster because local flow 
> controller partially updated. Administrator should disconnect node and review 
> flow for corruption.
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and
>  review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:985)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:655)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:384)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because FlowFiles from this Connection
> are currently held by LocalPort[id=99213c00-78ca-4848-112f-5454cc20656b, 
> type=INPUT_PORT, name=inputPort, group=innerPG]
> at 
> 

[jira] [Commented] (NIFI-12969) Under heavy load, nifi node unable to rejoin cluster, graph modified with temp funnel

2024-04-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833773#comment-17833773
 ] 

ASF subversion and git services commented on NIFI-12969:


Commit 3a03caa14968d0c9a2e6790373871e886b632f09 in nifi's branch 
refs/heads/main from Mark Payne
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=3a03caa149 ]

NIFI-12969: Fixed a typo in the #isTempDestinationNecessary method, where we 
were comparing existingConnection.getProcessGroup() to 
newDestination.getProcessGroup(), instead of comparing 
existingConnection.getDestination().getProcessGroup() to 
newDestination.getProcessGroup(). I.e., we were comparing the Destination's PG 
to the Connection's PG instead of comparing Destination's PG to Destination's 
PG. This resulted in using a temporary funnel when we shouldn't. And as a 
result it attempted to change a Connection's Destination while the destination 
was running.

This closes #8600.

Signed-off-by: Joseph Witt 


> Under heavy load, nifi node unable to rejoin cluster, graph modified with 
> temp funnel
> -
>
> Key: NIFI-12969
> URL: https://issues.apache.org/jira/browse/NIFI-12969
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.24.0, 2.0.0-M2
>Reporter: Nissim Shiman
>Assignee: Mark Payne
>Priority: Critical
> Fix For: 2.0.0-M3, 1.26.0
>
> Attachments: nifi-app.log, simple_flow.png, 
> simple_flow_with_temp-funnel.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Under heavy load, if a node leaves the cluster (due to heartbeat time out), 
> many times it is unable to rejoin the cluster.
> The nodes' graph will have been modified with a temp-funnel as well.
> Appears to be some sort of [timing 
> issue|https://github.com/apache/nifi/blob/rel/nifi-2.0.0-M2/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/connectable/StandardConnection.java#L298]
>  # To reproduce, on a nifi cluster of three nodes, set up:
> 2 GenerateFlowFile processors -> PG
> Inside PG:
> inputPort -> UpdateAttribute
>  # Keep all defaults except for the following:
> For UpdateAttribute terminate the success relationship
> One of the GenerateFlowFile processors can be disabled,
> the other one should have Run Schedule to be 0 min (this will allow for the 
> heavy load)
>  # In nifi.properties (on all 3 nodes) to allow for nodes to fall out of the 
> cluster, set: nifi.cluster.protocol.heartbeat.interval=2 sec  (default is 5) 
> nifi.cluster.protocol.heartbeat.missable.max=1   (default is 8)
> Restart nifi. Start flow. The nodes will quickly fall out and rejoin cluster. 
> After a few minutes one will likely not be able to rejoin.  The graph for 
> that node will have the disabled GenerateFlowFile now pointing to a funnel (a 
> temp-funnel) instead of the PG
> Stack trace on that nodes nifi-app.log will look like this: (this is from 
> 2.0.0-M2):
> {code:java}
> 2024-03-28 13:55:19,395 INFO [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Node disconnected due to Failed to 
> properly handle Reconnection request due to org.apache.nifi.control
> ler.serialization.FlowSynchronizationException: Failed to connect node to 
> cluster because local flow controller partially updated. Administrator should 
> disconnect node and review flow for corrup
> tion.
> 2024-03-28 13:55:19,395 ERROR [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
> due to: org.apache.nifi.controller.serialization.FlowSynchroniza
> tionException: Failed to connect node to cluster because local flow 
> controller partially updated. Administrator should disconnect node and review 
> flow for corruption.
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and
>  review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:985)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:655)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:384)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because FlowFiles from this Connection
> are currently held by LocalPort[id=99213c00-78ca-4848-112f-5454cc20656b, 
> type=INPUT_PORT, name=inputPort, group=innerPG]
> at 
> 

[jira] [Commented] (NIFI-12969) Under heavy load, nifi node unable to rejoin cluster, graph modified with temp funnel

2024-04-03 Thread Peter Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833763#comment-17833763
 ] 

Peter Gyori commented on NIFI-12969:


Thank you [~markap14] ! It makes sense. I did not catch that creating the temp 
funnel wasn't necessary in the first place.

> Under heavy load, nifi node unable to rejoin cluster, graph modified with 
> temp funnel
> -
>
> Key: NIFI-12969
> URL: https://issues.apache.org/jira/browse/NIFI-12969
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.24.0, 2.0.0-M2
>Reporter: Nissim Shiman
>Assignee: Mark Payne
>Priority: Critical
> Fix For: 2.0.0-M3, 1.26.0
>
> Attachments: nifi-app.log, simple_flow.png, 
> simple_flow_with_temp-funnel.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Under heavy load, if a node leaves the cluster (due to heartbeat time out), 
> many times it is unable to rejoin the cluster.
> The nodes' graph will have been modified with a temp-funnel as well.
> Appears to be some sort of [timing 
> issue|https://github.com/apache/nifi/blob/rel/nifi-2.0.0-M2/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/connectable/StandardConnection.java#L298]
>  # To reproduce, on a nifi cluster of three nodes, set up:
> 2 GenerateFlowFile processors -> PG
> Inside PG:
> inputPort -> UpdateAttribute
>  # Keep all defaults except for the following:
> For UpdateAttribute terminate the success relationship
> One of the GenerateFlowFile processors can be disabled,
> the other one should have Run Schedule to be 0 min (this will allow for the 
> heavy load)
>  # In nifi.properties (on all 3 nodes) to allow for nodes to fall out of the 
> cluster, set: nifi.cluster.protocol.heartbeat.interval=2 sec  (default is 5) 
> nifi.cluster.protocol.heartbeat.missable.max=1   (default is 8)
> Restart nifi. Start flow. The nodes will quickly fall out and rejoin cluster. 
> After a few minutes one will likely not be able to rejoin.  The graph for 
> that node will have the disabled GenerateFlowFile now pointing to a funnel (a 
> temp-funnel) instead of the PG
> Stack trace on that nodes nifi-app.log will look like this: (this is from 
> 2.0.0-M2):
> {code:java}
> 2024-03-28 13:55:19,395 INFO [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Node disconnected due to Failed to 
> properly handle Reconnection request due to org.apache.nifi.control
> ler.serialization.FlowSynchronizationException: Failed to connect node to 
> cluster because local flow controller partially updated. Administrator should 
> disconnect node and review flow for corrup
> tion.
> 2024-03-28 13:55:19,395 ERROR [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
> due to: org.apache.nifi.controller.serialization.FlowSynchroniza
> tionException: Failed to connect node to cluster because local flow 
> controller partially updated. Administrator should disconnect node and review 
> flow for corruption.
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and
>  review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:985)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:655)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:384)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because FlowFiles from this Connection
> are currently held by LocalPort[id=99213c00-78ca-4848-112f-5454cc20656b, 
> type=INPUT_PORT, name=inputPort, group=innerPG]
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:472)
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.sync(VersionedFlowSynchronizer.java:223)
> at 
> org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1740)
> at 
> org.apache.nifi.persistence.StandardFlowConfigurationDAO.load(StandardFlowConfigurationDAO.java:91)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:805)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:954)
> ... 3 common frames omitted
> Caused by: 

[jira] [Commented] (NIFI-12969) Under heavy load, nifi node unable to rejoin cluster, graph modified with temp funnel

2024-04-03 Thread Mark Payne (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833738#comment-17833738
 ] 

Mark Payne commented on NIFI-12969:
---

[~Nissim Shiman] [~pgyori] I pushed a PR that appears to address the issue. I 
believe you're on the right track, that the situation is caused by the fact 
that the temp funnel was incorrectly used. But instead of trying to detect when 
it's going to happen and/or rollback, the issue is that we had a bug in the 
logic for when the temp funnel was created. In this case, there should never be 
a temp funnel. In cases where we DO need a temp funnel, the existing logic 
should handle stopping the Port, which would make this work smoothly. The issue 
arose here because the Port was (rightly) left running. We just need to avoid 
creating the temp funnel unnecessarily.

> Under heavy load, nifi node unable to rejoin cluster, graph modified with 
> temp funnel
> -
>
> Key: NIFI-12969
> URL: https://issues.apache.org/jira/browse/NIFI-12969
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.24.0, 2.0.0-M2
>Reporter: Nissim Shiman
>Assignee: Mark Payne
>Priority: Critical
> Fix For: 2.0.0-M3, 1.26.0
>
> Attachments: nifi-app.log, simple_flow.png, 
> simple_flow_with_temp-funnel.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Under heavy load, if a node leaves the cluster (due to heartbeat time out), 
> many times it is unable to rejoin the cluster.
> The nodes' graph will have been modified with a temp-funnel as well.
> Appears to be some sort of [timing 
> issue|https://github.com/apache/nifi/blob/rel/nifi-2.0.0-M2/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/connectable/StandardConnection.java#L298]
>  # To reproduce, on a nifi cluster of three nodes, set up:
> 2 GenerateFlowFile processors -> PG
> Inside PG:
> inputPort -> UpdateAttribute
>  # Keep all defaults except for the following:
> For UpdateAttribute terminate the success relationship
> One of the GenerateFlowFile processors can be disabled,
> the other one should have Run Schedule to be 0 min (this will allow for the 
> heavy load)
>  # In nifi.properties (on all 3 nodes) to allow for nodes to fall out of the 
> cluster, set: nifi.cluster.protocol.heartbeat.interval=2 sec  (default is 5) 
> nifi.cluster.protocol.heartbeat.missable.max=1   (default is 8)
> Restart nifi. Start flow. The nodes will quickly fall out and rejoin cluster. 
> After a few minutes one will likely not be able to rejoin.  The graph for 
> that node will have the disabled GenerateFlowFile now pointing to a funnel (a 
> temp-funnel) instead of the PG
> Stack trace on that nodes nifi-app.log will look like this: (this is from 
> 2.0.0-M2):
> {code:java}
> 2024-03-28 13:55:19,395 INFO [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Node disconnected due to Failed to 
> properly handle Reconnection request due to org.apache.nifi.control
> ler.serialization.FlowSynchronizationException: Failed to connect node to 
> cluster because local flow controller partially updated. Administrator should 
> disconnect node and review flow for corrup
> tion.
> 2024-03-28 13:55:19,395 ERROR [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
> due to: org.apache.nifi.controller.serialization.FlowSynchroniza
> tionException: Failed to connect node to cluster because local flow 
> controller partially updated. Administrator should disconnect node and review 
> flow for corruption.
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and
>  review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:985)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:655)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:384)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because FlowFiles from this Connection
> are currently held by LocalPort[id=99213c00-78ca-4848-112f-5454cc20656b, 
> type=INPUT_PORT, name=inputPort, group=innerPG]
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:472)
> at 
> 

[jira] [Commented] (NIFI-12969) Under heavy load, nifi node unable to rejoin cluster, graph modified with temp funnel

2024-04-03 Thread Joe Witt (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833713#comment-17833713
 ] 

Joe Witt commented on NIFI-12969:
-

thanks [~pgyori].  Will flag this as a key issue to look into for 1.26 and 
2.0m3.  If triaged and found to be manageable we can relax it but otherwise it 
will get some attention before we release.

> Under heavy load, nifi node unable to rejoin cluster, graph modified with 
> temp funnel
> -
>
> Key: NIFI-12969
> URL: https://issues.apache.org/jira/browse/NIFI-12969
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.24.0, 2.0.0-M2
>Reporter: Nissim Shiman
>Assignee: Nissim Shiman
>Priority: Critical
> Fix For: 2.0.0-M3, 1.26.0
>
> Attachments: nifi-app.log, simple_flow.png, 
> simple_flow_with_temp-funnel.png
>
>
> Under heavy load, if a node leaves the cluster (due to heartbeat time out), 
> many times it is unable to rejoin the cluster.
> The nodes' graph will have been modified with a temp-funnel as well.
> Appears to be some sort of [timing 
> issue|https://github.com/apache/nifi/blob/rel/nifi-2.0.0-M2/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/connectable/StandardConnection.java#L298]
>  # To reproduce, on a nifi cluster of three nodes, set up:
> 2 GenerateFlowFile processors -> PG
> Inside PG:
> inputPort -> UpdateAttribute
>  # Keep all defaults except for the following:
> For UpdateAttribute terminate the success relationship
> One of the GenerateFlowFile processors can be disabled,
> the other one should have Run Schedule to be 0 min (this will allow for the 
> heavy load)
>  # In nifi.properties (on all 3 nodes) to allow for nodes to fall out of the 
> cluster, set: nifi.cluster.protocol.heartbeat.interval=2 sec  (default is 5) 
> nifi.cluster.protocol.heartbeat.missable.max=1   (default is 8)
> Restart nifi. Start flow. The nodes will quickly fall out and rejoin cluster. 
> After a few minutes one will likely not be able to rejoin.  The graph for 
> that node will have the disabled GenerateFlowFile now pointing to a funnel (a 
> temp-funnel) instead of the PG
> Stack trace on that nodes nifi-app.log will look like this: (this is from 
> 2.0.0-M2):
> {code:java}
> 2024-03-28 13:55:19,395 INFO [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Node disconnected due to Failed to 
> properly handle Reconnection request due to org.apache.nifi.control
> ler.serialization.FlowSynchronizationException: Failed to connect node to 
> cluster because local flow controller partially updated. Administrator should 
> disconnect node and review flow for corrup
> tion.
> 2024-03-28 13:55:19,395 ERROR [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
> due to: org.apache.nifi.controller.serialization.FlowSynchroniza
> tionException: Failed to connect node to cluster because local flow 
> controller partially updated. Administrator should disconnect node and review 
> flow for corruption.
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and
>  review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:985)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:655)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:384)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because FlowFiles from this Connection
> are currently held by LocalPort[id=99213c00-78ca-4848-112f-5454cc20656b, 
> type=INPUT_PORT, name=inputPort, group=innerPG]
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:472)
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.sync(VersionedFlowSynchronizer.java:223)
> at 
> org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1740)
> at 
> org.apache.nifi.persistence.StandardFlowConfigurationDAO.load(StandardFlowConfigurationDAO.java:91)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:805)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:954)
> ... 3 common frames 

[jira] [Commented] (NIFI-12969) Under heavy load, nifi node unable to rejoin cluster, graph modified with temp funnel

2024-04-03 Thread Nissim Shiman (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833705#comment-17833705
 ] 

Nissim Shiman commented on NIFI-12969:
--

Thank you very much [~pgyori] for the testing, diagrams, logs and bringing out 
the finer details of this.
This greatly appreciated.

I am looking at a solution like you suggested (i.e. wait a little in 
StandardConnection) and after a short period of time, to give up and in the 
calling method undo the temp connections (and remove the temp funnel) to try to 
keep the graph the way it was before the temp connection(s) were made.

There doesn't appear to be an easy way to predict this situation to avoid 
making the temp connections in the first place. I've tried checking for 
unacknowleged flowfiles on all a group's connections before making 
temp-funnel/temp destinations (for any connection).   But even if it checks 
out, by the time we get to setting an individual connection's destination, 
there could be unacknowledged flowfiles on the connection again.

> Under heavy load, nifi node unable to rejoin cluster, graph modified with 
> temp funnel
> -
>
> Key: NIFI-12969
> URL: https://issues.apache.org/jira/browse/NIFI-12969
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.24.0, 2.0.0-M2
>Reporter: Nissim Shiman
>Assignee: Nissim Shiman
>Priority: Major
> Attachments: nifi-app.log, simple_flow.png, 
> simple_flow_with_temp-funnel.png
>
>
> Under heavy load, if a node leaves the cluster (due to heartbeat time out), 
> many times it is unable to rejoin the cluster.
> The nodes' graph will have been modified with a temp-funnel as well.
> Appears to be some sort of [timing 
> issue|https://github.com/apache/nifi/blob/rel/nifi-2.0.0-M2/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/connectable/StandardConnection.java#L298]
>  # To reproduce, on a nifi cluster of three nodes, set up:
> 2 GenerateFlowFile processors -> PG
> Inside PG:
> inputPort -> UpdateAttribute
>  # Keep all defaults except for the following:
> For UpdateAttribute terminate the success relationship
> One of the GenerateFlowFile processors can be disabled,
> the other one should have Run Schedule to be 0 min (this will allow for the 
> heavy load)
>  # In nifi.properties (on all 3 nodes) to allow for nodes to fall out of the 
> cluster, set: nifi.cluster.protocol.heartbeat.interval=2 sec  (default is 5) 
> nifi.cluster.protocol.heartbeat.missable.max=1   (default is 8)
> Restart nifi. Start flow. The nodes will quickly fall out and rejoin cluster. 
> After a few minutes one will likely not be able to rejoin.  The graph for 
> that node will have the disabled GenerateFlowFile now pointing to a funnel (a 
> temp-funnel) instead of the PG
> Stack trace on that nodes nifi-app.log will look like this: (this is from 
> 2.0.0-M2):
> {code:java}
> 2024-03-28 13:55:19,395 INFO [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Node disconnected due to Failed to 
> properly handle Reconnection request due to org.apache.nifi.control
> ler.serialization.FlowSynchronizationException: Failed to connect node to 
> cluster because local flow controller partially updated. Administrator should 
> disconnect node and review flow for corrup
> tion.
> 2024-03-28 13:55:19,395 ERROR [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
> due to: org.apache.nifi.controller.serialization.FlowSynchroniza
> tionException: Failed to connect node to cluster because local flow 
> controller partially updated. Administrator should disconnect node and review 
> flow for corruption.
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and
>  review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:985)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:655)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:384)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because FlowFiles from this Connection
> are currently held by LocalPort[id=99213c00-78ca-4848-112f-5454cc20656b, 
> type=INPUT_PORT, name=inputPort, group=innerPG]
> at 
> 

[jira] [Commented] (NIFI-12969) Under heavy load, nifi node unable to rejoin cluster, graph modified with temp funnel

2024-04-03 Thread Peter Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833636#comment-17833636
 ] 

Peter Gyori commented on NIFI-12969:


FYI [~markap14] , [~joewitt] . This concerns NiFi-2.0.0-M3.

> Under heavy load, nifi node unable to rejoin cluster, graph modified with 
> temp funnel
> -
>
> Key: NIFI-12969
> URL: https://issues.apache.org/jira/browse/NIFI-12969
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.24.0, 2.0.0-M2
>Reporter: Nissim Shiman
>Assignee: Nissim Shiman
>Priority: Major
> Attachments: nifi-app.log, simple_flow.png, 
> simple_flow_with_temp-funnel.png
>
>
> Under heavy load, if a node leaves the cluster (due to heartbeat time out), 
> many times it is unable to rejoin the cluster.
> The nodes' graph will have been modified with a temp-funnel as well.
> Appears to be some sort of [timing 
> issue|https://github.com/apache/nifi/blob/rel/nifi-2.0.0-M2/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/connectable/StandardConnection.java#L298]
>  # To reproduce, on a nifi cluster of three nodes, set up:
> 2 GenerateFlowFile processors -> PG
> Inside PG:
> inputPort -> UpdateAttribute
>  # Keep all defaults except for the following:
> For UpdateAttribute terminate the success relationship
> One of the GenerateFlowFile processors can be disabled,
> the other one should have Run Schedule to be 0 min (this will allow for the 
> heavy load)
>  # In nifi.properties (on all 3 nodes) to allow for nodes to fall out of the 
> cluster, set: nifi.cluster.protocol.heartbeat.interval=2 sec  (default is 5) 
> nifi.cluster.protocol.heartbeat.missable.max=1   (default is 8)
> Restart nifi. Start flow. The nodes will quickly fall out and rejoin cluster. 
> After a few minutes one will likely not be able to rejoin.  The graph for 
> that node will have the disabled GenerateFlowFile now pointing to a funnel (a 
> temp-funnel) instead of the PG
> Stack trace on that nodes nifi-app.log will look like this: (this is from 
> 2.0.0-M2):
> {code:java}
> 2024-03-28 13:55:19,395 INFO [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Node disconnected due to Failed to 
> properly handle Reconnection request due to org.apache.nifi.control
> ler.serialization.FlowSynchronizationException: Failed to connect node to 
> cluster because local flow controller partially updated. Administrator should 
> disconnect node and review flow for corrup
> tion.
> 2024-03-28 13:55:19,395 ERROR [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
> due to: org.apache.nifi.controller.serialization.FlowSynchroniza
> tionException: Failed to connect node to cluster because local flow 
> controller partially updated. Administrator should disconnect node and review 
> flow for corruption.
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and
>  review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:985)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:655)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:384)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because FlowFiles from this Connection
> are currently held by LocalPort[id=99213c00-78ca-4848-112f-5454cc20656b, 
> type=INPUT_PORT, name=inputPort, group=innerPG]
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:472)
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.sync(VersionedFlowSynchronizer.java:223)
> at 
> org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1740)
> at 
> org.apache.nifi.persistence.StandardFlowConfigurationDAO.load(StandardFlowConfigurationDAO.java:91)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:805)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:954)
> ... 3 common frames omitted
> Caused by: java.lang.IllegalStateException: Cannot change destination of 
> Connection because FlowFiles from this Connection are currently held by 
> 

[jira] [Commented] (NIFI-12969) Under heavy load, nifi node unable to rejoin cluster, graph modified with temp funnel

2024-04-03 Thread Peter Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833586#comment-17833586
 ] 

Peter Gyori commented on NIFI-12969:


Thank you [~Nissim Shiman] for the detailed steps to reproduce the issue.

I managed to reproduce it with your steps even on a very recent build of NiFi, 
where the latest commit on the main branch is 
{noformat}
commit b574a7e4: NIFI-12987 allow controller service type to be 
searchable{noformat}
I used a 3-node cluster. On Apple Silicon one might need a little stricter 
settings for heartbeats:

 

 
{code:java}
nifi.cluster.protocol.heartbeat.interval=1
secnifi.cluster.protocol.heartbeat.missable.max=1 {code}
I needed two GenerateFlowFile processors, both with 2 simultaneous threads (and 
Run Schedule=0 min as you mentioned).

 

Screenshot of the test flow.

!simple_flow.png|width=645,height=323!

With the following log settings, I captured the attached [^nifi-app.log]

 
{code:java}


 {code}
I paste here the log entries that I find the most important. All of these 
happen within 2 milliseconds:

 

 
{code:java}
2024-04-03 13:48:53,896 TRACE [Timer-Driven Process Thread-2] 
org.apache.nifi.connectable.LocalPort 
LocalPort[id=a3c770bd-018e-1000--0d6b782a, type=INPUT_PORT, name=input, 
group=pg] obtained claim for FlowFileGate
2024-04-03 13:48:53,896 DEBUG [Reconnect to Cluster] 
o.a.n.f.s.StandardVersionedComponentSynchronizer Changing destination of 
Connection Connection[ID=a3c84edc-018e-1000--13bbf65e, Source 
ID=a6393e3a-3f79-1d9a-b4eb-2a25ca912ab9, Dest 
ID=a3c770bd-018e-1000--0d6b782a] from 
LocalPort[id=a3c770bd-018e-1000--0d6b782a, type=INPUT_PORT, name=input, 
group=pg] to StandardFunnel[id=a34c81f0-018e-1000-7faa-412adc4de192-temp-funnel]
2024-04-03 13:48:53,896 DEBUG [Timer-Driven Process Thread-2] 
org.apache.nifi.connectable.LocalPort 
LocalPort[id=a3c770bd-018e-1000--0d6b782a, type=INPUT_PORT, name=input, 
group=pg] Transferred 1 FlowFiles
2024-04-03 13:48:53,896 TRACE [Timer-Driven Process Thread-2] 
org.apache.nifi.connectable.LocalPort 
LocalPort[id=a3c770bd-018e-1000--0d6b782a, type=INPUT_PORT, name=input, 
group=pg] released claim for FlowFileGate
2024-04-03 13:48:53,896 INFO [Reconnect to Cluster] 
o.a.n.c.s.AffectedComponentSet Starting the following components: 
AffectedComponentSet[inputPorts=[], outputPorts=[], remoteInputPorts=[], 
remoteOutputPorts=[], processors=[], parameterProviders=[], 
flowRegistryClients=[], controllerServices=[], reportingTasks=[], 
flowAnalysisRules=[], statelessProcessGroups=[]]
2024-04-03 13:48:53,897 INFO [Reconnect to Cluster] 
o.a.nifi.controller.StandardFlowService Disconnecting node due to Failed to 
properly handle Reconnection request due to 
org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
to connect node to cluster because local flow controller partially updated. 
Administrator should disconnect node and review flow for corruption.
2024-04-03 13:48:53,897 INFO [Reconnect to Cluster] 
o.apache.nifi.controller.FlowController Will no longer send heartbeats
2024-04-03 13:48:53,897 INFO [Reconnect to Cluster] 
o.apache.nifi.controller.FlowController FlowController will stop sending 
heartbeats to Cluster Coordinator
2024-04-03 13:48:53,897 INFO [Reconnect to Cluster] 
o.apache.nifi.controller.FlowController Cluster State changed from Clustered to 
Not Clustered
2024-04-03 13:48:53,897 ERROR [Reconnect to Cluster] [...] 
    Caused by: java.lang.IllegalStateException: Cannot change destination of 
Connection because FlowFiles from this Connection are currently held by 
LocalPort[id=a3c770bd-018e-1000--0d6b782a, type=INPUT_PORT, name=input, 
group=pg]
    at 
org.apache.nifi.connectable.StandardConnection.setDestination(StandardConnection.java:299)
    at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.updateConnectionDestinations(StandardVersionedComponentSynchronizer.java:706)
    at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronize(StandardVersionedComponentSynchronizer.java:423)
    at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.lambda$synchronize$0(StandardVersionedComponentSynchronizer.java:248)
    at 
org.apache.nifi.controller.flow.AbstractFlowManager.withParameterContextResolution(AbstractFlowManager.java:638)
    at 
org.apache.nifi.flow.synchronization.StandardVersionedComponentSynchronizer.synchronize(StandardVersionedComponentSynchronizer.java:243)
    at 
org.apache.nifi.groups.StandardProcessGroup.synchronizeFlow(StandardProcessGroup.java:3863)
    at 
org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:464){code}
That is: we have an input port with id a3c770bd-018e-1000--0d6b782a and 
the following events happen:
 # obtained claim for FlowFileGate 

[jira] [Commented] (NIFI-12969) Under heavy load, nifi node unable to rejoin cluster, graph modified with temp funnel

2024-03-29 Thread Pierre Villard (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832247#comment-17832247
 ] 

Pierre Villard commented on NIFI-12969:
---

That is interesting, I'll definitely be extremely interested in whatever you 
find to solve this issue.

> Under heavy load, nifi node unable to rejoin cluster, graph modified with 
> temp funnel
> -
>
> Key: NIFI-12969
> URL: https://issues.apache.org/jira/browse/NIFI-12969
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.24.0, 2.0.0-M2
>Reporter: Nissim Shiman
>Assignee: Nissim Shiman
>Priority: Major
>
> Under heavy load, if a node leaves the cluster (due to heartbeat time out), 
> many times it is unable to rejoin the cluster.
> The nodes' graph will have been modified with a temp-funnel as well.
> Appears to be some sort of [timing 
> issue|https://github.com/apache/nifi/blob/rel/nifi-2.0.0-M2/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/connectable/StandardConnection.java#L298]
>  # To reproduce, on a nifi cluster of three nodes, set up:
> 2 GenerateFlowFile processors -> PG
> Inside PG:
> inputPort -> UpdateAttribute
>  # Keep all defaults except for the following:
> For UpdateAttribute terminate the success relationship
> One of the GenerateFlowFile processors can be disabled,
> the other one should have Run Schedule to be 0 min (this will allow for the 
> heavy load)
>  # In nifi.properties (on all 3 nodes) to allow for nodes to fall out of the 
> cluster, set: nifi.cluster.protocol.heartbeat.interval=2 sec  (default is 5) 
> nifi.cluster.protocol.heartbeat.missable.max=1   (default is 8)
> Restart nifi. Start flow. The nodes will quickly fall out and rejoin cluster. 
> After a few minutes one will likely not be able to rejoin.  The graph for 
> that node will have the disabled GenerateFlowFile now pointing to a funnel (a 
> temp-funnel) instead of the PG
> Stack trace on that nodes nifi-app.log will look like this: (this is from 
> 2.0.0-M2):
> {code:java}
> 2024-03-28 13:55:19,395 INFO [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Node disconnected due to Failed to 
> properly handle Reconnection request due to org.apache.nifi.control
> ler.serialization.FlowSynchronizationException: Failed to connect node to 
> cluster because local flow controller partially updated. Administrator should 
> disconnect node and review flow for corrup
> tion.
> 2024-03-28 13:55:19,395 ERROR [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
> due to: org.apache.nifi.controller.serialization.FlowSynchroniza
> tionException: Failed to connect node to cluster because local flow 
> controller partially updated. Administrator should disconnect node and review 
> flow for corruption.
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and
>  review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:985)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:655)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:384)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because FlowFiles from this Connection
> are currently held by LocalPort[id=99213c00-78ca-4848-112f-5454cc20656b, 
> type=INPUT_PORT, name=inputPort, group=innerPG]
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:472)
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.sync(VersionedFlowSynchronizer.java:223)
> at 
> org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1740)
> at 
> org.apache.nifi.persistence.StandardFlowConfigurationDAO.load(StandardFlowConfigurationDAO.java:91)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:805)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:954)
> ... 3 common frames omitted
> Caused by: java.lang.IllegalStateException: Cannot change destination of 
> Connection because FlowFiles from this Connection are currently held by 
> LocalPort[id=99213c00-78ca-4848-112f-5454cc20656b
> , 

[jira] [Commented] (NIFI-12969) Under heavy load, nifi node unable to rejoin cluster, graph modified with temp funnel

2024-03-29 Thread Nissim Shiman (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832242#comment-17832242
 ] 

Nissim Shiman commented on NIFI-12969:
--

[~pvillard] This is an excellent observation as the symptoms are very similar.  
I tried this on a 2.0.0-SNAPSHOT that is post the NIFI-12232 fix, but the issue 
still remains.  

On closer look, it appears the cause of this issue (in StandardConnection.java) 
is the next line after the one that caused NIFI-12232, so maybe that one was 
masking some occurrences of this this one, but, yes, you are correct they are 
in the same area of code.

> Under heavy load, nifi node unable to rejoin cluster, graph modified with 
> temp funnel
> -
>
> Key: NIFI-12969
> URL: https://issues.apache.org/jira/browse/NIFI-12969
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.24.0, 2.0.0-M2
>Reporter: Nissim Shiman
>Assignee: Nissim Shiman
>Priority: Major
>
> Under heavy load, if a node leaves the cluster (due to heartbeat time out), 
> many times it is unable to rejoin the cluster.
> The nodes' graph will have been modified with a temp-funnel as well.
> Appears to be some sort of [timing 
> issue|https://github.com/apache/nifi/blob/rel/nifi-2.0.0-M2/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/connectable/StandardConnection.java#L298]
>  # To reproduce, on a nifi cluster of three nodes, set up:
> 2 GenerateFlowFile processors -> PG
> Inside PG:
> inputPort -> UpdateAttribute
>  # Keep all defaults except for the following:
> For UpdateAttribute terminate the success relationship
> One of the GenerateFlowFile processors can be disabled,
> the other one should have Run Schedule to be 0 min (this will allow for the 
> heavy load)
>  # In nifi.properties (on all 3 nodes) to allow for nodes to fall out of the 
> cluster, set: nifi.cluster.protocol.heartbeat.interval=2 sec  (default is 5) 
> nifi.cluster.protocol.heartbeat.missable.max=1   (default is 8)
> Restart nifi. Start flow. The nodes will quickly fall out and rejoin cluster. 
> After a few minutes one will likely not be able to rejoin.  The graph for 
> that node will have the disabled GenerateFlowFile now pointing to a funnel (a 
> temp-funnel) instead of the PG
> Stack trace on that nodes nifi-app.log will look like this: (this is from 
> 2.0.0-M2):
> {code:java}
> 2024-03-28 13:55:19,395 INFO [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Node disconnected due to Failed to 
> properly handle Reconnection request due to org.apache.nifi.control
> ler.serialization.FlowSynchronizationException: Failed to connect node to 
> cluster because local flow controller partially updated. Administrator should 
> disconnect node and review flow for corrup
> tion.
> 2024-03-28 13:55:19,395 ERROR [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
> due to: org.apache.nifi.controller.serialization.FlowSynchroniza
> tionException: Failed to connect node to cluster because local flow 
> controller partially updated. Administrator should disconnect node and review 
> flow for corruption.
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and
>  review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:985)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:655)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:384)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because FlowFiles from this Connection
> are currently held by LocalPort[id=99213c00-78ca-4848-112f-5454cc20656b, 
> type=INPUT_PORT, name=inputPort, group=innerPG]
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:472)
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.sync(VersionedFlowSynchronizer.java:223)
> at 
> org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1740)
> at 
> org.apache.nifi.persistence.StandardFlowConfigurationDAO.load(StandardFlowConfigurationDAO.java:91)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:805)
> at 
> 

[jira] [Commented] (NIFI-12969) Under heavy load, nifi node unable to rejoin cluster, graph modified with temp funnel

2024-03-29 Thread Pierre Villard (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832223#comment-17832223
 ] 

Pierre Villard commented on NIFI-12969:
---

[~Nissim Shiman] - I think there may be good chances that this is solved with 
NIFI-12232

> Under heavy load, nifi node unable to rejoin cluster, graph modified with 
> temp funnel
> -
>
> Key: NIFI-12969
> URL: https://issues.apache.org/jira/browse/NIFI-12969
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.24.0, 2.0.0-M2
>Reporter: Nissim Shiman
>Assignee: Nissim Shiman
>Priority: Major
>
> Under heavy load, if a node leaves the cluster (due to heartbeat time out), 
> many times it is unable to rejoin the cluster.
> The nodes' graph will have been modified with a temp-funnel as well.
> Appears to be some sort of [timing 
> issue|https://github.com/apache/nifi/blob/rel/nifi-2.0.0-M2/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/connectable/StandardConnection.java#L298]
>  # To reproduce, on a nifi cluster of three nodes, set up:
> 2 GenerateFlowFile processors -> PG
> Inside PG:
> inputPort -> UpdateAttribute
>  # Keep all defaults except for the following:
> For UpdateAttribute terminate the success relationship
> One of the GenerateFlowFile processors can be disabled,
> the other one should have Run Schedule to be 0 min (this will allow for the 
> heavy load)
>  # In nifi.properties (on all 3 nodes) to allow for nodes to fall out of the 
> cluster, set: nifi.cluster.protocol.heartbeat.interval=2 sec  (default is 5) 
> nifi.cluster.protocol.heartbeat.missable.max=1   (default is 8)
> Restart nifi. Start flow. The nodes will quickly fall out and rejoin cluster. 
> After a few minutes one will likely not be able to rejoin.  The graph for 
> that node will have the disabled GenerateFlowFile now pointing to a funnel (a 
> temp-funnel) instead of the PG
> Stack trace on that nodes nifi-app.log will look like this: (this is from 
> 2.0.0-M2):
> {code:java}
> 2024-03-28 13:55:19,395 INFO [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Node disconnected due to Failed to 
> properly handle Reconnection request due to org.apache.nifi.control
> ler.serialization.FlowSynchronizationException: Failed to connect node to 
> cluster because local flow controller partially updated. Administrator should 
> disconnect node and review flow for corrup
> tion.
> 2024-03-28 13:55:19,395 ERROR [Reconnect to Cluster] 
> o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
> due to: org.apache.nifi.controller.serialization.FlowSynchroniza
> tionException: Failed to connect node to cluster because local flow 
> controller partially updated. Administrator should disconnect node and review 
> flow for corruption.
> org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed 
> to connect node to cluster because local flow controller partially updated. 
> Administrator should disconnect node and
>  review flow for corruption.
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:985)
> at 
> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:655)
> at 
> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:384)
> at java.base/java.lang.Thread.run(Thread.java:1583)
> Caused by: 
> org.apache.nifi.controller.serialization.FlowSynchronizationException: 
> java.lang.IllegalStateException: Cannot change destination of Connection 
> because FlowFiles from this Connection
> are currently held by LocalPort[id=99213c00-78ca-4848-112f-5454cc20656b, 
> type=INPUT_PORT, name=inputPort, group=innerPG]
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.synchronizeFlow(VersionedFlowSynchronizer.java:472)
> at 
> org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.sync(VersionedFlowSynchronizer.java:223)
> at 
> org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1740)
> at 
> org.apache.nifi.persistence.StandardFlowConfigurationDAO.load(StandardFlowConfigurationDAO.java:91)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:805)
> at 
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:954)
> ... 3 common frames omitted
> Caused by: java.lang.IllegalStateException: Cannot change destination of 
> Connection because FlowFiles from this Connection are currently held by 
> LocalPort[id=99213c00-78ca-4848-112f-5454cc20656b
> , type=INPUT_PORT,