RE: Some retry flowfile questions

Tomislav Novosel Fri, 23 Apr 2021 02:18:35 -0700

Hi Harald, Mark,

I asked about RetryFlowfile the other day and its potential danger, but no 
answer yet.
My question was not referred to penalty and yield really, but just to make 
consideration about it.


@Harald, if on this Retry in your schema you are using RetryFlowfile processor, 
there can be sooner or later
potential of deadlock if you are having a lot of files going through this point 
in your flow.

Imagine there is big number of flowfiles and that “unreliable” endpoint you 
mentioned is sleeping
for a while, all the flowfiles are going to failure relationship and after 
sometime(depends how you configured
number of retries in RetryFlowfile processor) files are going to retry 
relationship to retry endpoint again.

If both of that relationships are full to the backpressure threshold, there 
will deadlock
and even if that endpoint wakes up, NiFi will not try it.

Related to your “slow down” question, in RetryFlowFile there is an option to 
penalize flowfiles before sending
to retry relationship.

Thanks,
Regards,
Tom

From: Dobbernack, Harald (Key-Work) <[email protected]>
Sent: 23 April 2021 09:50
To: [email protected]
Subject: AW: Some retry flowfile questions

Mark, thank you so much for this great explanation!
Harald

Von: Mark Payne <[email protected]<mailto:[email protected]>>
Gesendet: Donnerstag, 22. April 2021 22:32
An: [email protected]<mailto:[email protected]>
Betreff: Re: Some retry flowfile questions

Geoff,

The difference between penalization and yielding is whether the failure is 
data-dependent or not.

So, an easy way to think about this is to consider a scenario where you have a 
simple flow: GetFTP -> PutFTP.
Something else is picking up data from the FTP server that you’re putting to.

You know that sometimes the data will already exist with the same name, but you 
don’t want to overwrite it because it’s likely to actually be different data 
with a conflicting filename.
So you want to wait a while and try to push that file again. In the meantime, 
you want to continue pushing other files to the FTP server.
In this case, the processor would penalize that FlowFile so that it can 
continue working on other data.

On the other hand, if PutFTP were to get a connection failure, it’s not even 
able to connect to that FTP server, then it doesn’t make sense to penalize that 
FlowFile and move onto the next one and try to push it. It can’t connect, so it 
can’t make progress regardless of what data it has.
In this case, the processor should yield.

Note, however, that it is up to the processor developer to tell the processor 
to yield or to penalize the FlowFile. It’s not up to the creator of the data 
flow.

Does that help?

Thanks
-Mark

On Apr 22, 2021, at 2:08 PM, Greene (US), Geoffrey N 
<[email protected]<mailto:[email protected]>> wrote:

We have a rest endpoint that is “unreliable”. It works sometimes.
When it doesn’t work, the solution seems to be to sleep for awhile, then try 
again

So I put in a retry processor:

http processor    <-  Retry
   |      \             ^
Success  Failure  -----|

So far, so good, that loop works.  But how do I handle the slow down?
Does the penalty / yield go on the retry? Or on the http?  Whats the 
difference?  How do I know if I should YIELD or impose a penalty? I’m not sure 
I understand the differences here

Thanks
Geoff



Harald Dobbernack

Key-Work Consulting GmbH | Kriegsstr. 100 | 76133 | Karlsruhe | Germany | 
www.key-work.de<https://www.key-work.de> | 
Datenschutz<https://www.key-work.de/de/footer/datenschutz.html>
Fon: +49-721-78203-264 | E-Mail: 
[email protected]<mailto:[email protected]>

Key-Work Consulting GmbH, Karlsruhe, HRB 108695, HRG Mannheim
Geschäftsführer: Andreas Stappert, Tobin Wotring

RE: Some retry flowfile questions

Reply via email to