Scott,

Do you have a constant flow of data from your database, or is this more
like a large batch comes in and processes in NiFi for a while and then you
some time later you pull another batch?

If it is more like the batch scenario, you might be able to stick a
ControlRate processor before your "check if done" processor to throttle the
flow files entering the loop. This obviously doesn't work well if you have
a constant flow of new data entering the loop because it will just make
everything before the loop back up as well, but it might be reasonable
while working on a given batch.

You can also increase the back-pressure threshold on all of the queues if
you have enough memory allocated to the NiFi JVM. Right-click on the queues
and configure, they default to 10k flow files or 1GB I believe, based on
the screenshot they are hitting the 10k threshold so you could bump this up
a bit to give more breathing room.

-Bryan


On Mon, Feb 27, 2017 at 6:21 PM, Matt Foley <[email protected]> wrote:

> If I understand correctly, your desired goal is for each input row that
> specifies a range, A to A+N, you would generate a sequence of N (or perhaps
> N+1) flowfiles, right?  And the only difference in each flowfile is that
> you’ve Replaced the range specification with a single number from that
> range?
>
>
>
> I would suggest that at the level of the row input, you use ExecuteScript
> to expand each input row into N rows, with the substituted number values,
> then run that through SplitText, to get one row per flowfile.  This should
> be way more efficient, as well as much safer than a cyclic graph.
>
>
>
> Cheers,
>
> --Matt
>
>
>
> *From: *Scott Wagner <[email protected]>
> *Reply-To: *"[email protected]" <[email protected]>
> *Date: *Monday, February 27, 2017 at 2:34 PM
> *To: *"[email protected]" <[email protected]>
> *Subject: *How to gracefully handle a circular graph?
>
>
>
> Hello all,
>
>     I have created a graph where I am downloading a number of rows from an
> SQL database, and each row defines a range of numbers (100-200, 700-1500,
> etc.).  What I am then doing on the NiFi side is generating an individual
> FlowFile for each number in that range.  The way that I was accomplishing
> this was by setting attributes to the "current" value to the lower
> boundary, and an attribute of the upper boundary, and then creating two
> queues off the "success" output for a Processor (the ReplaceText processor
> in the bottom right of the image), one of which goes on to process that
> number's record (going off the bottom right in the picture), and the other
> one of which goes off to a processor to increment the "current" number, and
> will then forward it to the processor that will check to make sure that
> "current" is less than or equal to "upper boundary".
>
>     This works great, until the queues end up filling up.  Once this
> happens, I have a gridlock situation where none of the processors in this
> triangle are running any longer, because they all have a full output
> queue.  I have tried searching the Internet and put a little thought into
> how I could make it so that my "Check if done" processor would prefer
> entries that are coming in from the circular portion of the graph, but so
> far haven't been able to come up with anything.  What I have tried is
> making both of the input queues to "Check if done" go through a funnel, and
> set an Oldest FlowFile prioritizer, but it still eventually ends up filling
> up the entire triangle of queues.
>
>
>
>     Does anyone have a suggestion as to how I could gracefully handle a
> situation like this?  I appreciate any advice.
>
> Thanks!
>
> - Scott Wagner
>
>
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>
> Virus-free. www.avg.com
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>
>
>

Reply via email to