Re: Oversized queue between process groups

Mark Payne Fri, 30 Aug 2019 12:20:39 -0700

Jeremy,

I'm not sure of any bugs off the top of my head that would necessarily cause 
this, but version 1.6.0 is getting fairly old, so there may well be something 
that I've forgotten about. That being said, there are two "types of bugs" that 
I think are most probable here: (1) There isn't really that much data queued up 
and NiFi is actually reporting the wrong size for the queue; or (2) perhaps one 
node in the cluster got out of sync in terms of the flow and one node actually 
is configured without backpressure being applied?


So there are two things that I would recommend checking out to help diagnose 
what is going on here. Firstly, is the huge backlog spread across all nodes or 
just on one node in the cluster? To determine this, you can go to the "Global 
menu" / Hamburger menu, and go to the Summary Page. From there, if you go to 
the Connections tab and find the connection in there (should be easy  if you 
sort the table based on queue size), you can click the button on the far-right 
that shows the Cluster view, which will break down the size of the connection 
per-node, so you know if all nodes in the cluster have a huge queue size or 
just one.

Secondly, I would be curious to know what happens if you restart the node(s) 
with the huge backlog? Do the FlowFiles magically disappear on restart, with 
the queue showing a small number (indicative of the queue size just being 
wrong), or are they still there (indicative of the queue size being correct)?

Also, what operating system are you running? There was a bug recently about 
data not being properly swapped back in on Windows but I think that was 
introduced after 1.6.0 and then fixed quickly.

This should help to know where to focus energy on finding the problem.

Thanks
-Mark

On Aug 30, 2019, at 12:24 PM, Jeremy Pemberton-Pigott 
<fuzzych...@gmail.com<mailto:fuzzych...@gmail.com>> wrote:

Yes there is one but not near the output port of the split json processor it's 
shortly after the input port of a child PG. The output is actually connected to 
3 child PGs and each of those has an update attribute processor on their output 
port. The other PG input port on the left is connected to a route on attribute 
processor inside it.

Queue of PG1 input-> input port to processors -> connection to 3 child PGs -> 
each PG has split json after input port -> processors -> update attribute -> 
queue to output port of child PG -> queue to output port of PG1 -> queue to PG2 
input (100s of millions in queue) -> input port to route on attribute -> ...

Regards,

Jeremy


On 30 Aug 2019, at 20:45, Bryan Bende 
<bbe...@gmail.com<mailto:bbe...@gmail.com>> wrote:

Can you show what is happening inside the first process group? Is there a 
SplitText processor with line count of 1?

On Fri, Aug 30, 2019 at 4:21 AM Jeremy Pemberton-Pigott 
<fuzzych...@gmail.com<mailto:fuzzych...@gmail.com>> wrote:
Hi Pierre,

I'm using Nifi version 1.6.0.

04/03/2018 08:16:22 UTC

Tagged nifi-1.6.0-RC3

From 7c0ee01 on branch NIFI-4995-RC3

FlowFile expiration = 0
Back pressure object threshold = 20000
Back pressure data size threshold = 1GB

The connection is just from the output port of 1 PG to the input port of 
another PG.  Inside the PG all the connections are using the same settings 
between processors.

Regards,

Jeremy

On Fri, Aug 30, 2019 at 4:14 PM Pierre Villard 
<pierre.villard...@gmail.com<mailto:pierre.villard...@gmail.com>> wrote:
Hi Jeremy,

It seems very weird that you get 200M flow files in a relationship that should 
have backpressure set at 20k flow files. While backpressure is not a hard limit 
you should not get to such numbers. Can you give us more details? What version 
of NiFi are you using? What's the configuration of your relationship between 
your two process groups?

Thanks,
Pierre

Le ven. 30 août 2019 à 07:46, Jeremy Pemberton-Pigott 
<fuzzych...@gmail.com<mailto:fuzzych...@gmail.com>> a écrit :
Hi,

I have a 3 node Nifi 1.6.0 cluster.  It ran out of disk space when there was a 
log jam of flow files (from slow HBase lookups).  My queue is configured for 
20,000 but 1 node has over 206 million flow files stuck in the queue.  I 
managed to clear up some disk space to get things going again but it seems that 
after a few mins of processing all the processors in the Log Parser process 
group will stop processing and show zero in/out.

Is this a bug fixed in a later version?

Each time I have to tear down the Docker containers running Nifi and restart it 
to process a few 10,000s and repeat every few mins.  Any idea what I should do 
to keep it processing the data (nifi-app.log doesn't show my anything unusual 
about the stop or delay) until the 1 node can clear the backlog?

<image.png>

Regards,

Jeremy
--
Sent from Gmail Mobile

Re: Oversized queue between process groups

Reply via email to