Jeremy, I'm not sure of any bugs off the top of my head that would necessarily cause this, but version 1.6.0 is getting fairly old, so there may well be something that I've forgotten about. That being said, there are two "types of bugs" that I think are most probable here: (1) There isn't really that much data queued up and NiFi is actually reporting the wrong size for the queue; or (2) perhaps one node in the cluster got out of sync in terms of the flow and one node actually is configured without backpressure being applied?
So there are two things that I would recommend checking out to help diagnose what is going on here. Firstly, is the huge backlog spread across all nodes or just on one node in the cluster? To determine this, you can go to the "Global menu" / Hamburger menu, and go to the Summary Page. From there, if you go to the Connections tab and find the connection in there (should be easy if you sort the table based on queue size), you can click the button on the far-right that shows the Cluster view, which will break down the size of the connection per-node, so you know if all nodes in the cluster have a huge queue size or just one. Secondly, I would be curious to know what happens if you restart the node(s) with the huge backlog? Do the FlowFiles magically disappear on restart, with the queue showing a small number (indicative of the queue size just being wrong), or are they still there (indicative of the queue size being correct)? Also, what operating system are you running? There was a bug recently about data not being properly swapped back in on Windows but I think that was introduced after 1.6.0 and then fixed quickly. This should help to know where to focus energy on finding the problem. Thanks -Mark On Aug 30, 2019, at 12:24 PM, Jeremy Pemberton-Pigott <fuzzych...@gmail.com<mailto:fuzzych...@gmail.com>> wrote: Yes there is one but not near the output port of the split json processor it's shortly after the input port of a child PG. The output is actually connected to 3 child PGs and each of those has an update attribute processor on their output port. The other PG input port on the left is connected to a route on attribute processor inside it. Queue of PG1 input-> input port to processors -> connection to 3 child PGs -> each PG has split json after input port -> processors -> update attribute -> queue to output port of child PG -> queue to output port of PG1 -> queue to PG2 input (100s of millions in queue) -> input port to route on attribute -> ... Regards, Jeremy On 30 Aug 2019, at 20:45, Bryan Bende <bbe...@gmail.com<mailto:bbe...@gmail.com>> wrote: Can you show what is happening inside the first process group? Is there a SplitText processor with line count of 1? On Fri, Aug 30, 2019 at 4:21 AM Jeremy Pemberton-Pigott <fuzzych...@gmail.com<mailto:fuzzych...@gmail.com>> wrote: Hi Pierre, I'm using Nifi version 1.6.0. 04/03/2018 08:16:22 UTC Tagged nifi-1.6.0-RC3 From 7c0ee01 on branch NIFI-4995-RC3 FlowFile expiration = 0 Back pressure object threshold = 20000 Back pressure data size threshold = 1GB The connection is just from the output port of 1 PG to the input port of another PG. Inside the PG all the connections are using the same settings between processors. Regards, Jeremy On Fri, Aug 30, 2019 at 4:14 PM Pierre Villard <pierre.villard...@gmail.com<mailto:pierre.villard...@gmail.com>> wrote: Hi Jeremy, It seems very weird that you get 200M flow files in a relationship that should have backpressure set at 20k flow files. While backpressure is not a hard limit you should not get to such numbers. Can you give us more details? What version of NiFi are you using? What's the configuration of your relationship between your two process groups? Thanks, Pierre Le ven. 30 août 2019 à 07:46, Jeremy Pemberton-Pigott <fuzzych...@gmail.com<mailto:fuzzych...@gmail.com>> a écrit : Hi, I have a 3 node Nifi 1.6.0 cluster. It ran out of disk space when there was a log jam of flow files (from slow HBase lookups). My queue is configured for 20,000 but 1 node has over 206 million flow files stuck in the queue. I managed to clear up some disk space to get things going again but it seems that after a few mins of processing all the processors in the Log Parser process group will stop processing and show zero in/out. Is this a bug fixed in a later version? Each time I have to tear down the Docker containers running Nifi and restart it to process a few 10,000s and repeat every few mins. Any idea what I should do to keep it processing the data (nifi-app.log doesn't show my anything unusual about the stop or delay) until the 1 node can clear the backlog? <image.png> Regards, Jeremy -- Sent from Gmail Mobile