If I understand correctly, your desired goal is for each input row that 
specifies a range, A to A+N, you would generate a sequence of N (or perhaps 
N+1) flowfiles, right?  And the only difference in each flowfile is that you’ve 
Replaced the range specification with a single number from that range?

 

I would suggest that at the level of the row input, you use ExecuteScript to 
expand each input row into N rows, with the substituted number values, then run 
that through SplitText, to get one row per flowfile.  This should be way more 
efficient, as well as much safer than a cyclic graph.

 

Cheers,

--Matt

 

From: Scott Wagner <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Monday, February 27, 2017 at 2:34 PM
To: "[email protected]" <[email protected]>
Subject: How to gracefully handle a circular graph?

 

Hello all,

    I have created a graph where I am downloading a number of rows from an SQL 
database, and each row defines a range of numbers (100-200, 700-1500, etc.).  
What I am then doing on the NiFi side is generating an individual FlowFile for 
each number in that range.  The way that I was accomplishing this was by 
setting attributes to the "current" value to the lower boundary, and an 
attribute of the upper boundary, and then creating two queues off the "success" 
output for a Processor (the ReplaceText processor in the bottom right of the 
image), one of which goes on to process that number's record (going off the 
bottom right in the picture), and the other one of which goes off to a 
processor to increment the "current" number, and will then forward it to the 
processor that will check to make sure that "current" is less than or equal to 
"upper boundary".

    This works great, until the queues end up filling up.  Once this happens, I 
have a gridlock situation where none of the processors in this triangle are 
running any longer, because they all have a full output queue.  I have tried 
searching the Internet and put a little thought into how I could make it so 
that my "Check if done" processor would prefer entries that are coming in from 
the circular portion of the graph, but so far haven't been able to come up with 
anything.  What I have tried is making both of the input queues to "Check if 
done" go through a funnel, and set an Oldest FlowFile prioritizer, but it still 
eventually ends up filling up the entire triangle of queues.



    Does anyone have a suggestion as to how I could gracefully handle a 
situation like this?  I appreciate any advice.

Thanks!

- Scott Wagner

Virus-free. www.avg.com 

 

Reply via email to