Hi Harald, Mark, I asked about RetryFlowfile the other day and its potential danger, but no answer yet. My question was not referred to penalty and yield really, but just to make consideration about it.
@Harald, if on this Retry in your schema you are using RetryFlowfile processor, there can be sooner or later potential of deadlock if you are having a lot of files going through this point in your flow. Imagine there is big number of flowfiles and that “unreliable” endpoint you mentioned is sleeping for a while, all the flowfiles are going to failure relationship and after sometime(depends how you configured number of retries in RetryFlowfile processor) files are going to retry relationship to retry endpoint again. If both of that relationships are full to the backpressure threshold, there will deadlock and even if that endpoint wakes up, NiFi will not try it. Related to your “slow down” question, in RetryFlowFile there is an option to penalize flowfiles before sending to retry relationship. Thanks, Regards, Tom From: Dobbernack, Harald (Key-Work) <[email protected]> Sent: 23 April 2021 09:50 To: [email protected] Subject: AW: Some retry flowfile questions Mark, thank you so much for this great explanation! Harald Von: Mark Payne <[email protected]<mailto:[email protected]>> Gesendet: Donnerstag, 22. April 2021 22:32 An: [email protected]<mailto:[email protected]> Betreff: Re: Some retry flowfile questions Geoff, The difference between penalization and yielding is whether the failure is data-dependent or not. So, an easy way to think about this is to consider a scenario where you have a simple flow: GetFTP -> PutFTP. Something else is picking up data from the FTP server that you’re putting to. You know that sometimes the data will already exist with the same name, but you don’t want to overwrite it because it’s likely to actually be different data with a conflicting filename. So you want to wait a while and try to push that file again. In the meantime, you want to continue pushing other files to the FTP server. In this case, the processor would penalize that FlowFile so that it can continue working on other data. On the other hand, if PutFTP were to get a connection failure, it’s not even able to connect to that FTP server, then it doesn’t make sense to penalize that FlowFile and move onto the next one and try to push it. It can’t connect, so it can’t make progress regardless of what data it has. In this case, the processor should yield. Note, however, that it is up to the processor developer to tell the processor to yield or to penalize the FlowFile. It’s not up to the creator of the data flow. Does that help? Thanks -Mark On Apr 22, 2021, at 2:08 PM, Greene (US), Geoffrey N <[email protected]<mailto:[email protected]>> wrote: We have a rest endpoint that is “unreliable”. It works sometimes. When it doesn’t work, the solution seems to be to sleep for awhile, then try again So I put in a retry processor: http processor <- Retry | \ ^ Success Failure -----| So far, so good, that loop works. But how do I handle the slow down? Does the penalty / yield go on the retry? Or on the http? Whats the difference? How do I know if I should YIELD or impose a penalty? I’m not sure I understand the differences here Thanks Geoff Harald Dobbernack Key-Work Consulting GmbH | Kriegsstr. 100 | 76133 | Karlsruhe | Germany | www.key-work.de<https://www.key-work.de> | Datenschutz<https://www.key-work.de/de/footer/datenschutz.html> Fon: +49-721-78203-264 | E-Mail: [email protected]<mailto:[email protected]> Key-Work Consulting GmbH, Karlsruhe, HRB 108695, HRG Mannheim Geschäftsführer: Andreas Stappert, Tobin Wotring
