I have a lot of processors in my flow, all of which can, and do, route
flowfiles to their failure relationships at some point.

In the first iteration of my flow, I routed every failure relationship to
an inactive DebugFlow but monitoring these was difficult, I wouldn't get
notifications when something started to fail and if the queue got filled up
it would apply backpressure and prevent new, good flowfiles from being
processed.

Not only was that just not a good way to handle failures, but my flow was
littered with all of these do-nothing processors and was an eye sore. So
then I tried routing processor failure relationships into themselves which
tidied up my flow but caused nifi to go berserk when a failure occurred
because the failure relationship is not penalized (nor should it be) and
most processors don't provide a 'Retry' relationship (InvokeHttp being a
notable exception). But really, most processors wouldn't conceivable
succeed if they were tried again. I mostly just wanted the flowfiles to sit
there until I had a chance to check out why they failed and fix them
manually.

This leads me to https://issues.apache.org/jira/browse/NIFI-3351. I think I
need a way to store failed flowfiles, fix them and reprocess them. The
process group I am currently considering implementing everywhere is:

Input Port [Failed Flowfile] --> PutS3 deadletter/<failure
location>/<failure reason>/${uuid} --> PutSlack
ListS3 deadletter/<failure location>/<failure reason>/ --> FetchS3 ->
Output Port [Fixed]

This gives me storage of failed messages logically grouped and in a place
that wont block up my flow since s3 never goes down, err... wait.
Configurable process groups or template like
https://issues.apache.org/jira/browse/NIFI-1096 would make this easier to
reuse.

How do you manage failure relationships?

- Nick

Reply via email to