Bryan, you managed to explain it in three sentences! :) now I get it!
thanks a bunch
On Tue, Mar 6, 2018 at 4:23 PM, Bryan Bende <bbe...@gmail.com> wrote:
> "Penalty Duration" is per flow file, and "Yield" is for the processor.
> If the processor penalizes a flow file and transfers it to a queue,
> whatever is processing that queue won't take that flow file from the
> queue until the penalty duration has passed.
> If a processor yields, then the framework won't execute that processor
> for up to the yield duration, which would mean it won't process any
> flow files during that time, which would include new incoming flow
> files as well as penalized flow files that might be in a self-loop.
> On Tue, Mar 6, 2018 at 4:01 PM, Boris Tyukin <bo...@boristyukin.com>
> > Hi Mark,
> > thanks for your response! especially because I saw your name in that
> Jira :)
> > I think it makes sense to "keep trying until you're successful".
> > I am a bit confused by "yield" and "penalize" parameters. Can you give
> me an
> > example how they are used? Let's say, I use ExecuteSQL processor and
> > failure rel to itself. Then I set Penalty duration to 5 minutes and Yield
> > duration to 1 minute and my source database is down for maintenance at
> > time. I am playing with different scenarios here but not sure I
> > what I am seeing.
> > I've read the docs 5 times now and still confused :)
> > And what do you think about that solution, proposed by Alessio? looks
> > and efficient to me and uses only one extra processor
> > thanks,
> > Boris
> > On Tue, Mar 6, 2018 at 3:28 PM, Mark Payne <marka...@hotmail.com> wrote:
> >> Hey Boris,
> >> Using the UpdateAttribute and RouteOnAttribute approach is only
> >> when you want
> >> to retry N number of times (or for some time period) and after that
> >> elapses to treat the data
> >> differently. Most of the time, though, what is used is to simply loop
> >> 'failure' relationship back
> >> to the processor itself. So failures would simply remain in the flow,
> >> trying indefinitely. When a processor
> >> is unable to communicate with some external service due to some
> >> intermittent issue, that processor
> >> generally should "yield", meaning that the processor will not be
> >> for some amount of time
> >> (by default it is 1 second).
> >> So in this way, it's very simple to just say "keep trying until you're
> >> successful." You could also set "age-off"
> >> to occur so that if the data is more than say 1 hour old you can have
> >> automatically just discard the data.
> >> There are some situations, though, in which users will need to try for
> >> 10 times and then route the data differently.
> >> We could definitely improve that experience instead of having to use
> >> UpdateAttribute / RouteOnAttribute. But from
> >> my experience simply looping until successful is the most common
> >> and so that's probably why we've not
> >> really seen much traction there.
> >> Thanks
> >> -Mark
> >> On Mar 6, 2018, at 3:02 PM, Boris Tyukin <bo...@boristyukin.com> wrote:
> >> Just found this Jira
> >> https://issues.apache.org/jira/browse/NIFI-90
> >> I am surprised it has not got any traction after 3 years...Having used
> >> Apache Airflow for a while, I am looking to retry capabilities in NiFi
> >> it seems it comes down to "build your own" flow approach, that would
> >> retries in a loop and then sleeping for some time. The best alternative
> >> solution I found was suggested by Alessio Palma
> >> https://community.hortonworks.com/questions/56167/is-there-
> >> IMHO it still would be nice to have retry capabilities like with Apache
> >> Airflow. You can specify a global retry behavior for a flow or specify
> >> options per task/processor. This helps a lot to deal with intermittent
> >> issues, like losing network connection or source database system, being
> >> for maintenance. Airflow can also send an email on retry and supports a
> >> bunch of other parameters around retries:
> >> https://airflow.apache.org/code.html#baseoperator
> >> retries (int) – the number of retries that should be performed before
> >> failing the task
> >> retry_delay (timedelta) – delay between retries
> >> retry_exponential_backoff (bool) – allow progressive longer waits
> >> retries by using exponential backoff algorithm on retry delay (delay
> will be
> >> converted into seconds)
> >> max_retry_delay (timedelta) – maximum delay interval between retries
> >> on_retry_callback – much like the on_failure_callback except that it is
> >> executed when retries occur.
> >> Is everyone using UpdateAttribute and RouteOnAttribute and Sleep method
> >> implement retries?
> >> thanks,
> >> Boris