Scott, Do you have a constant flow of data from your database, or is this more like a large batch comes in and processes in NiFi for a while and then you some time later you pull another batch?
If it is more like the batch scenario, you might be able to stick a ControlRate processor before your "check if done" processor to throttle the flow files entering the loop. This obviously doesn't work well if you have a constant flow of new data entering the loop because it will just make everything before the loop back up as well, but it might be reasonable while working on a given batch. You can also increase the back-pressure threshold on all of the queues if you have enough memory allocated to the NiFi JVM. Right-click on the queues and configure, they default to 10k flow files or 1GB I believe, based on the screenshot they are hitting the 10k threshold so you could bump this up a bit to give more breathing room. -Bryan On Mon, Feb 27, 2017 at 6:21 PM, Matt Foley <[email protected]> wrote: > If I understand correctly, your desired goal is for each input row that > specifies a range, A to A+N, you would generate a sequence of N (or perhaps > N+1) flowfiles, right? And the only difference in each flowfile is that > you’ve Replaced the range specification with a single number from that > range? > > > > I would suggest that at the level of the row input, you use ExecuteScript > to expand each input row into N rows, with the substituted number values, > then run that through SplitText, to get one row per flowfile. This should > be way more efficient, as well as much safer than a cyclic graph. > > > > Cheers, > > --Matt > > > > *From: *Scott Wagner <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Monday, February 27, 2017 at 2:34 PM > *To: *"[email protected]" <[email protected]> > *Subject: *How to gracefully handle a circular graph? > > > > Hello all, > > I have created a graph where I am downloading a number of rows from an > SQL database, and each row defines a range of numbers (100-200, 700-1500, > etc.). What I am then doing on the NiFi side is generating an individual > FlowFile for each number in that range. The way that I was accomplishing > this was by setting attributes to the "current" value to the lower > boundary, and an attribute of the upper boundary, and then creating two > queues off the "success" output for a Processor (the ReplaceText processor > in the bottom right of the image), one of which goes on to process that > number's record (going off the bottom right in the picture), and the other > one of which goes off to a processor to increment the "current" number, and > will then forward it to the processor that will check to make sure that > "current" is less than or equal to "upper boundary". > > This works great, until the queues end up filling up. Once this > happens, I have a gridlock situation where none of the processors in this > triangle are running any longer, because they all have a full output > queue. I have tried searching the Internet and put a little thought into > how I could make it so that my "Check if done" processor would prefer > entries that are coming in from the circular portion of the graph, but so > far haven't been able to come up with anything. What I have tried is > making both of the input queues to "Check if done" go through a funnel, and > set an Oldest FlowFile prioritizer, but it still eventually ends up filling > up the entire triangle of queues. > > > > Does anyone have a suggestion as to how I could gracefully handle a > situation like this? I appreciate any advice. > > Thanks! > > - Scott Wagner > > > <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> > > Virus-free. www.avg.com > <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> > > >
