Thank you Matt. This certainly does help. I implemented the back pressure thresholds today just as you suggested. And I had previously implemented the loop described by [1]. I am also employing (disabled) Monitor Activity processors to force accumulation of failed flowfiles in queues before those processors, for review. The combination of these three approaches allows me to throttle flow when things go wrong and queue size reaches the threshold, and it permits me the insights I need to determine what is causing any issues in the first place. I can then react, fix, and get the show back on the road.
Not sure that this will be my final solution, but these are certainly steps in the right direction for me. Regards, Jim On Wed, Aug 24, 2016 at 9:51 AM, Matthew Clarke <[email protected]> wrote: > Hello James, > > Welcome to the NiFi community. > > Whether it is a good idea or bad idea to loop a failure relationship back > on a processor for retry is completely dependent on the type or processor. > For example, looping the failure relationship on a PutFile is a very good > idea; however, looping failure on a processor like CompressContent > (configured to decompress) may not be the best idea. A file that fails to > decompress will likely continue to fail to decompress creating a never > ending loop. It may also be desirable to build a dataflow loop [1]. This > will allow you to retry only so many times before take some new action. > That action may be send out an email notification about the dataflow > problem. > > The NiFi processors have a configurable "penalty duration". The default > for this is 30 seconds. Anytime a processor routes FlowFiles to a failure > relationship, they are penalized for this duration of time. During the > Penalty period NiFi will ignore these penalized FlowFiles and work on other > FLowFiles in queue coming back to these when the penalty expires. This > help with the race condition you mentioned. > > In your example of disk failure, you need to take other things into > consideration. How much data can your NiFi afford to ingest before your > disk fills? Have I followed best practices for deploying my NiFi instance? > [2] > Connections within NiFi provide a means for setting object or size back > pressure thresholds. [3] This allow you to control per connection many > FlowFile can queue before the source processor for that connection is no > longer triggered to run. You can set back pressure on every connection all > the way back to your dataflow ingest point(s) to essential halt your > dataflow before disks fill in the case of a major failure like you > described. This also prevents one bad behaving dataflow on a canvas of > many dataflows from taking over all resources. > > Hope this helps, > Matt > > [1] https://cwiki.apache.org/confluence/download/ > attachments/57904847/Retry_Count_Loop.xml?version=1&modificationDate= > 1433271239000&api=v2 > [2] https://community.hortonworks.com/articles/7882/hdfnifi- > best-practices-for-setting-up-a-high-perfo.html > [3] https://nifi.apache.org/docs/nifi-docs/html/user- > guide.html#Connecting_Components >
