Re: NiFi retry capabilities

2018-03-06 Thread Boris Tyukin
Bryan, you managed to explain it in three sentences! :) now I get it!
thanks a bunch


On Tue, Mar 6, 2018 at 4:23 PM, Bryan Bende  wrote:

> Boris,
>
> "Penalty Duration" is per flow file, and "Yield" is for the processor.
>
> If the processor penalizes a flow file and transfers it to a queue,
> whatever is processing that queue won't take that flow file from the
> queue until the penalty duration has passed.
>
> If a processor yields, then the framework won't execute that processor
> for up to the yield duration, which would mean it won't process any
> flow files during that time, which would include new incoming flow
> files as well as penalized flow files that might be in a self-loop.
>
> -Bryan
>
>
>
> On Tue, Mar 6, 2018 at 4:01 PM, Boris Tyukin 
> wrote:
> > Hi Mark,
> >
> > thanks for your response! especially because I saw your name in that
> Jira :)
> >
> > I think it makes sense to "keep trying until you're successful".
> >
> > I am a bit confused by "yield" and "penalize" parameters. Can you give
> me an
> > example how they are used? Let's say, I use ExecuteSQL processor and
> route
> > failure rel to itself. Then I set Penalty duration to 5 minutes and Yield
> > duration to 1 minute and my source database is down for maintenance at
> the
> > time. I am playing with different scenarios here but not sure I
> understand
> > what I am seeing.
> >
> > I've read the docs 5 times now and still confused :)
> >
> > And what do you think about that solution, proposed by Alessio? looks
> simple
> > and efficient to me and uses only one extra processor
> >
> > thanks,
> > Boris
> >
> > On Tue, Mar 6, 2018 at 3:28 PM, Mark Payne  wrote:
> >>
> >> Hey Boris,
> >>
> >> Using the UpdateAttribute and RouteOnAttribute approach is only
> necessary
> >> when you want
> >> to retry N number of times (or for some time period) and after that
> >> elapses to treat the data
> >> differently. Most of the time, though, what is used is to simply loop
> the
> >> 'failure' relationship back
> >> to the processor itself. So failures would simply remain in the flow,
> >> trying indefinitely. When a processor
> >> is unable to communicate with some external service due to some
> >> intermittent issue, that processor
> >> generally should "yield", meaning that the processor will not be
> triggered
> >> for some amount of time
> >> (by default it is 1 second).
> >>
> >> So in this way, it's very simple to just say "keep trying until you're
> >> successful." You could also set "age-off"
> >> to occur so that if the data is more than say 1 hour old you can have
> nifi
> >> automatically just discard the data.
> >>
> >> There are some situations, though, in which users will need to try for
> say
> >> 10 times and then route the data differently.
> >> We could definitely improve that experience instead of having to use
> >> UpdateAttribute / RouteOnAttribute. But from
> >> my experience simply looping until successful is the most common
> scenario
> >> and so that's probably why we've not
> >> really seen much traction there.
> >>
> >> Thanks
> >> -Mark
> >>
> >>
> >>
> >> On Mar 6, 2018, at 3:02 PM, Boris Tyukin  wrote:
> >>
> >> Just found this Jira
> >> https://issues.apache.org/jira/browse/NIFI-90
> >>
> >> I am surprised it has not got any traction after 3 years...Having used
> >> Apache Airflow for a while, I am looking to retry capabilities in NiFi
> and
> >> it seems it comes down to "build your own" flow approach, that would
> handle
> >> retries in a loop and then sleeping for some time. The best alternative
> >> solution I found was suggested by Alessio Palma
> >> https://community.hortonworks.com/questions/56167/is-there-
> wait-processor-in-nifi.html
> >>
> >> IMHO it still would be nice to have retry capabilities like with Apache
> >> Airflow. You can specify a global retry behavior for a flow or specify
> retry
> >> options per task/processor. This helps a lot to deal with intermittent
> >> issues, like losing network connection or source database system, being
> down
> >> for maintenance. Airflow can also send an email on retry and supports a
> >> bunch of other parameters around retries:
> >>
> >> https://airflow.apache.org/code.html#baseoperator
> >>
> >> retries (int) – the number of retries that should be performed before
> >> failing the task
> >> retry_delay (timedelta) – delay between retries
> >> retry_exponential_backoff (bool) – allow progressive longer waits
> between
> >> retries by using exponential backoff algorithm on retry delay (delay
> will be
> >> converted into seconds)
> >> max_retry_delay (timedelta) – maximum delay interval between retries
> >>
> >> on_retry_callback – much like the on_failure_callback except that it is
> >> executed when retries occur.
> >>
> >> Is everyone using UpdateAttribute and RouteOnAttribute and Sleep method
> to
> >> implement retries?
> >>
> >> thanks,
> >> Boris
> >>
> >>
> >
>


Re: NiFi retry capabilities

2018-03-06 Thread Bryan Bende
Boris,

"Penalty Duration" is per flow file, and "Yield" is for the processor.

If the processor penalizes a flow file and transfers it to a queue,
whatever is processing that queue won't take that flow file from the
queue until the penalty duration has passed.

If a processor yields, then the framework won't execute that processor
for up to the yield duration, which would mean it won't process any
flow files during that time, which would include new incoming flow
files as well as penalized flow files that might be in a self-loop.

-Bryan



On Tue, Mar 6, 2018 at 4:01 PM, Boris Tyukin  wrote:
> Hi Mark,
>
> thanks for your response! especially because I saw your name in that Jira :)
>
> I think it makes sense to "keep trying until you're successful".
>
> I am a bit confused by "yield" and "penalize" parameters. Can you give me an
> example how they are used? Let's say, I use ExecuteSQL processor and route
> failure rel to itself. Then I set Penalty duration to 5 minutes and Yield
> duration to 1 minute and my source database is down for maintenance at the
> time. I am playing with different scenarios here but not sure I understand
> what I am seeing.
>
> I've read the docs 5 times now and still confused :)
>
> And what do you think about that solution, proposed by Alessio? looks simple
> and efficient to me and uses only one extra processor
>
> thanks,
> Boris
>
> On Tue, Mar 6, 2018 at 3:28 PM, Mark Payne  wrote:
>>
>> Hey Boris,
>>
>> Using the UpdateAttribute and RouteOnAttribute approach is only necessary
>> when you want
>> to retry N number of times (or for some time period) and after that
>> elapses to treat the data
>> differently. Most of the time, though, what is used is to simply loop the
>> 'failure' relationship back
>> to the processor itself. So failures would simply remain in the flow,
>> trying indefinitely. When a processor
>> is unable to communicate with some external service due to some
>> intermittent issue, that processor
>> generally should "yield", meaning that the processor will not be triggered
>> for some amount of time
>> (by default it is 1 second).
>>
>> So in this way, it's very simple to just say "keep trying until you're
>> successful." You could also set "age-off"
>> to occur so that if the data is more than say 1 hour old you can have nifi
>> automatically just discard the data.
>>
>> There are some situations, though, in which users will need to try for say
>> 10 times and then route the data differently.
>> We could definitely improve that experience instead of having to use
>> UpdateAttribute / RouteOnAttribute. But from
>> my experience simply looping until successful is the most common scenario
>> and so that's probably why we've not
>> really seen much traction there.
>>
>> Thanks
>> -Mark
>>
>>
>>
>> On Mar 6, 2018, at 3:02 PM, Boris Tyukin  wrote:
>>
>> Just found this Jira
>> https://issues.apache.org/jira/browse/NIFI-90
>>
>> I am surprised it has not got any traction after 3 years...Having used
>> Apache Airflow for a while, I am looking to retry capabilities in NiFi and
>> it seems it comes down to "build your own" flow approach, that would handle
>> retries in a loop and then sleeping for some time. The best alternative
>> solution I found was suggested by Alessio Palma
>> https://community.hortonworks.com/questions/56167/is-there-wait-processor-in-nifi.html
>>
>> IMHO it still would be nice to have retry capabilities like with Apache
>> Airflow. You can specify a global retry behavior for a flow or specify retry
>> options per task/processor. This helps a lot to deal with intermittent
>> issues, like losing network connection or source database system, being down
>> for maintenance. Airflow can also send an email on retry and supports a
>> bunch of other parameters around retries:
>>
>> https://airflow.apache.org/code.html#baseoperator
>>
>> retries (int) – the number of retries that should be performed before
>> failing the task
>> retry_delay (timedelta) – delay between retries
>> retry_exponential_backoff (bool) – allow progressive longer waits between
>> retries by using exponential backoff algorithm on retry delay (delay will be
>> converted into seconds)
>> max_retry_delay (timedelta) – maximum delay interval between retries
>>
>> on_retry_callback – much like the on_failure_callback except that it is
>> executed when retries occur.
>>
>> Is everyone using UpdateAttribute and RouteOnAttribute and Sleep method to
>> implement retries?
>>
>> thanks,
>> Boris
>>
>>
>


Re: NiFi retry capabilities

2018-03-06 Thread Boris Tyukin
Hi Mark,

thanks for your response! especially because I saw your name in that Jira :)

I think it makes sense to "keep trying until you're successful".

I am a bit confused by "yield" and "penalize" parameters. Can you give me
an example how they are used? Let's say, I use ExecuteSQL processor and
route failure rel to itself. Then I set Penalty duration to 5 minutes and
Yield duration to 1 minute and my source database is down for
maintenance at the time. I am playing with different scenarios here but not
sure I understand what I am seeing.

I've read the docs 5 times now and still confused :)

And what do you think about that solution, proposed by Alessio? looks
simple and efficient to me and uses only one extra processor

thanks,
Boris

On Tue, Mar 6, 2018 at 3:28 PM, Mark Payne  wrote:

> Hey Boris,
>
> Using the UpdateAttribute and RouteOnAttribute approach is only necessary
> when you want
> to retry N number of times (or for some time period) and after that
> elapses to treat the data
> differently. Most of the time, though, what is used is to simply loop the
> 'failure' relationship back
> to the processor itself. So failures would simply remain in the flow,
> trying indefinitely. When a processor
> is unable to communicate with some external service due to some
> intermittent issue, that processor
> generally should "yield", meaning that the processor will not be triggered
> for some amount of time
> (by default it is 1 second).
>
> So in this way, it's very simple to just say "keep trying until you're
> successful." You could also set "age-off"
> to occur so that if the data is more than say 1 hour old you can have nifi
> automatically just discard the data.
>
> There are some situations, though, in which users will need to try for say
> 10 times and then route the data differently.
> We could definitely improve that experience instead of having to use
> UpdateAttribute / RouteOnAttribute. But from
> my experience simply looping until successful is the most common scenario
> and so that's probably why we've not
> really seen much traction there.
>
> Thanks
> -Mark
>
>
>
> On Mar 6, 2018, at 3:02 PM, Boris Tyukin  wrote:
>
> Just found this Jira
> https://issues.apache.org/jira/browse/NIFI-90
>
> I am surprised it has not got any traction after 3 years...Having used
> Apache Airflow for a while, I am looking to retry capabilities in NiFi and
> it seems it comes down to "build your own" flow approach, that would handle
> retries in a loop and then sleeping for some time. The best alternative
> solution I found was suggested by Alessio Palma
> 
> https://community.hortonworks.com/questions/56167/is-there-wait-processor-
> in-nifi.html
>
> IMHO it still would be nice to have retry capabilities like with Apache
> Airflow. You can specify a global retry behavior for a flow or specify
> retry options per task/processor. This helps a lot to deal with
> intermittent issues, like losing network connection or source database
> system, being down for maintenance. Airflow can also send an email on retry
> and supports a bunch of other parameters around retries:
>
> https://airflow.apache.org/code.html#baseoperator
>
>
>- *retries* (*int*) – the number of retries that should be performed
>before failing the task
>- *retry_delay* (*timedelta*) – delay between retries
>- *retry_exponential_backoff* (*bool*) – allow progressive longer
>waits between retries by using exponential backoff algorithm on retry delay
>(delay will be converted into seconds)
>- *max_retry_delay* (*timedelta*) – maximum delay interval between
>retries
>
>
>- *on_retry_callback* – much like the on_failure_callback except that
>it is executed when retries occur.
>
> Is everyone using UpdateAttribute and RouteOnAttribute and Sleep method
> to implement retries?
>
> thanks,
> Boris
>
>
>


Re: NiFi retry capabilities

2018-03-06 Thread Mark Payne
Hey Boris,

Using the UpdateAttribute and RouteOnAttribute approach is only necessary when 
you want
to retry N number of times (or for some time period) and after that elapses to 
treat the data
differently. Most of the time, though, what is used is to simply loop the 
'failure' relationship back
to the processor itself. So failures would simply remain in the flow, trying 
indefinitely. When a processor
is unable to communicate with some external service due to some intermittent 
issue, that processor
generally should "yield", meaning that the processor will not be triggered for 
some amount of time
(by default it is 1 second).

So in this way, it's very simple to just say "keep trying until you're 
successful." You could also set "age-off"
to occur so that if the data is more than say 1 hour old you can have nifi 
automatically just discard the data.

There are some situations, though, in which users will need to try for say 10 
times and then route the data differently.
We could definitely improve that experience instead of having to use 
UpdateAttribute / RouteOnAttribute. But from
my experience simply looping until successful is the most common scenario and 
so that's probably why we've not
really seen much traction there.

Thanks
-Mark



On Mar 6, 2018, at 3:02 PM, Boris Tyukin 
mailto:bo...@boristyukin.com>> wrote:

Just found this Jira
https://issues.apache.org/jira/browse/NIFI-90

I am surprised it has not got any traction after 3 years...Having used Apache 
Airflow for a while, I am looking to retry capabilities in NiFi and it seems it 
comes down to "build your own" flow approach, that would handle retries in a 
loop and then sleeping for some time. The best alternative solution I found was 
suggested by Alessio 
Palma 
https://community.hortonworks.com/questions/56167/is-there-wait-processor-in-nifi.html

IMHO it still would be nice to have retry capabilities like with Apache 
Airflow. You can specify a global retry behavior for a flow or specify retry 
options per task/processor. This helps a lot to deal with intermittent issues, 
like losing network connection or source database system, being down for 
maintenance. Airflow can also send an email on retry and supports a bunch of 
other parameters around retries:

https://airflow.apache.org/code.html#baseoperator


  *   retries (int) – the number of retries that should be performed before 
failing the task
  *   retry_delay (timedelta) – delay between retries
  *   retry_exponential_backoff (bool) – allow progressive longer waits between 
retries by using exponential backoff algorithm on retry delay (delay will be 
converted into seconds)
  *   max_retry_delay (timedelta) – maximum delay interval between retries

  *   on_retry_callback – much like the on_failure_callback except that it is 
executed when retries occur.

Is everyone using UpdateAttribute and RouteOnAttribute and Sleep method to 
implement retries?

thanks,
Boris



NiFi retry capabilities

2018-03-06 Thread Boris Tyukin
Just found this Jira
https://issues.apache.org/jira/browse/NIFI-90

I am surprised it has not got any traction after 3 years...Having used
Apache Airflow for a while, I am looking to retry capabilities in NiFi and
it seems it comes down to "build your own" flow approach, that would handle
retries in a loop and then sleeping for some time. The best alternative
solution I found was suggested by Alessio Palma

https://community.hortonworks.com/questions/56167/is-there-wait-processor-in-nifi.html


IMHO it still would be nice to have retry capabilities like with Apache
Airflow. You can specify a global retry behavior for a flow or specify
retry options per task/processor. This helps a lot to deal with
intermittent issues, like losing network connection or source database
system, being down for maintenance. Airflow can also send an email on retry
and supports a bunch of other parameters around retries:

https://airflow.apache.org/code.html#baseoperator


   - *retries* (*int*) – the number of retries that should be performed
   before failing the task
   - *retry_delay* (*timedelta*) – delay between retries
   - *retry_exponential_backoff* (*bool*) – allow progressive longer waits
   between retries by using exponential backoff algorithm on retry delay
   (delay will be converted into seconds)
   - *max_retry_delay* (*timedelta*) – maximum delay interval between
   retries


   - *on_retry_callback* – much like the on_failure_callback except that it
   is executed when retries occur.

Is everyone using UpdateAttribute and RouteOnAttribute and Sleep method to
implement retries?

thanks,
Boris