Joe, The first thing that comes to mind would be NIFI-6285, as Bryan points out. However, that only would affect you if you are running on Windows. So, the first question is: what operating system are you running on? :)
If it's not Windows, I would recommend getting some diagnostics info if possible. To do this, you can go to http://<hostname>:<port>/nifi-api/processors/<processor-id>/diagnostics. For example, if you get to nifi by going to http://nifi01:8080/nifi, and you want diagnostics for processor with ID 1234, then try going to http://nifi01:8080/nifi-api/processors/1234/diagnostics in your browser. But a couple of caveats on the 'diagnostics' approach above. It will only work if you are running an insecure NiFi instance, or if you are secured using certificates. We want the diagnostics for the Processor that is either the source of the connection or the destination of the connection - it doesn't matter which. This will give us a lot of information about the internal structure of the connection's FlowFile Queue. Of course, you said that your connection is between two Process Groups, which means that neither the source nor the destination is a Processor, so I would recommend creating a dummy Processor like UpdateAttribute and temporarily dragging the Connection so that it points to that Processor, just to get the diagnostic information, then dragging the connection back. Of course, it would also be helpful to look for any errors in the logs. But if you are able to get the diagnostics info as described above, that's usually the best bet for debugging this sort of thing. Thanks -Mark On Jun 4, 2019, at 11:13 AM, Bryan Bende <[email protected]<mailto:[email protected]>> wrote: Joe, There are two known issues that possibly seem related... The first was already addressed in 1.9.0, but the reason I mention it is because it was specific to a connection between two ports: https://issues.apache.org/jira/browse/NIFI-5919 The second is not in a release yet, but is addressed in master, and has to do with swapping: https://issues.apache.org/jira/browse/NIFI-6285 Seems like you wouldn't hit the first one since you are on 1.9.2, but does seem odd that is the same scenario. Mark P probably knows best about debugging, but I'm guessing possibly a thread dump while in this state would be helpful. -Bryan On Tue, Jun 4, 2019 at 10:56 AM Joe Gresock <[email protected]> wrote: I have round robin load balanced connections working on one cluster, but on another, this type of connection seems to be stuck. What would be the best way to debug this problem? The connection is from one processor group to another, so it's from an Output Port to an Input Port. My configuration is as follows: nifi.cluster.load.balance.host= nifi.cluster.load.balance.port=6342 nifi.cluster.load.balance.connections.per.node=4 nifi.cluster.load.balance.max.thread.count=8 nifi.cluster.load.balance.comms.timeout=30 sec And I ensured port 6342 is open from one node to another using the cluster node addresses. Is there some error that should appear in the logs if flow files get stuck here? I suspect they are actually stuck, not just missing, because the remainder of the flow is back-pressured up until this point in the flow. Thanks! Joe
