Re: Need help understanding backpressure

Mark Payne Wed, 20 Apr 2016 12:11:24 -0700

Chris,

I did create a ticket for this [1].


Just wanted to reply to the email to close the loop on this. Feel free to 
update with any other thoughts
or ideas that you may have.

Thanks
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-1793 
<https://issues.apache.org/jira/browse/NIFI-1793>



> On Apr 19, 2016, at 11:51 AM, McDermott, Chris Kevin (MSDU - 
> STaTS/StorefrontRemote) <[email protected]> wrote:
> 
> Not exactly what I was thinking, but this is better!
> 
> Thanks,
> Chris
> 
> 
> 
> 
> On 4/19/16, 8:44 AM, "Mark Payne" <[email protected]> wrote:
> 
>> Chris,
>> 
>> I like the idea of providing a way to enforce backpressure based on how full 
>> the content or FlowFIle repository is.
>> I would imagine that this would be something that we would also configure on 
>> a connection, just like the other backpressure
>> is configured, so that we could allow, for example, more "important" or more 
>> time-sensitive data to come into the flow
>> even if the repository is 90% full whereas other data may not be allowed to 
>> enter once the repo hits 60% full, so that we
>> ensure that we have room for the other data.
>> 
>> Is this what you had in mind?
>> 
>> Thanks
>> -Mark
>> 
>> 
>>> On Apr 15, 2016, at 2:38 PM, McDermott, Chris Kevin (MSDU - 
>>> STaTS/StorefrontRemote) <[email protected]> wrote:
>>> 
>>> Thanks for the clarification and explanation of the design philosophy. It 
>>> does make sense. I think it comes down to me trying to use back pressure 
>>> for a purpose for which it was not designed.
>>> 
>>> What if there was a way to configure a processor to be paused based on 
>>> available disk space dropping below some threshold.  That way ingress 
>>> processors, as identified by the user, could be prevented from flooding the 
>>> system with too much data.  Thoughts?
>>> 
>>> Chris
>>> 
>>> 
>>> 
>>> 
>>> On 4/15/16, 1:12 PM, "Mark Payne" <[email protected]> wrote:
>>> 
>>>> Chris,
>>>> 
>>>> When you apply backpressure to that connection, it will cause the 
>>>> processor that is
>>>> the source of the connection to stop being scheduled to run until the 
>>>> queue clears out.
>>>> However, as you noted, data will still queue up in that processor's 
>>>> incoming connections.
>>>> So to force backpressure to propagate all the way back to the source, you 
>>>> would
>>>> need to configure each of the connections in the flow to have backpressure 
>>>> applied.
>>>> 
>>>> The reason behind this is that we can have many different source, each 
>>>> routing data to
>>>> many different destinations. So if the queue before a 'terminal processor' 
>>>> is filled,
>>>> we won't want to prevent data from coming in from some source if only some 
>>>> portion of
>>>> that data will go to that processor.
>>>> 
>>>> For example, consider the following flow:
>>>> 
>>>> A --> B --> C --> D
>>>>                ^
>>>> E --> F -----|
>>>>                v
>>>>                G
>>>> 
>>>> Where Processor A sends 100% of data to B and then C and D.
>>>> Maybe only 1% of data from Processor E makes its way to D, though,
>>>> and 99% of its data goes to G instead.
>>>> 
>>>> If the queue from C to D fills up, we may not want to stop the data flowing
>>>> in from E because most of its data is going to G. Or we may want to stop 
>>>> data
>>>> coming in from E only if the queue from F to C backs up to say 100,000 
>>>> FlowFiles.
>>>> 
>>>> By ensuring that backpressure is applied only to that one connection, we 
>>>> can leverage
>>>> this to control which sources stop bringing in data when.
>>>> 
>>>> Hopefully this provided some clarification of how this works and why it 
>>>> was done this way
>>>> rather than confusing you more :)
>>>> 
>>>> However, I can see the benefit in setting a backpressure threshold only 
>>>> once. And I think
>>>> there are a couple of possible improvements here:
>>>> 
>>>> (1) We could allow the user to select multiple connections and then 
>>>> configure backpressure 
>>>> and have that applied to all selected connections.
>>>> 
>>>> (2) We could allow the user to set the backpressure and indicate that it 
>>>> should be propagated back
>>>> to all upstream connections.  This feels a little more dangerous, though, 
>>>> because it would be easy
>>>> to change configurations inadvertently.
>>>> 
>>>> Hopefully this help!
>>>> 
>>>> Thanks
>>>> -Mark
>>>> 
>>>> 
>>>> 
>>>>> On Apr 15, 2016, at 12:52 PM, McDermott, Chris Kevin (MSDU - 
>>>>> STaTS/StorefrontRemote) <[email protected]> wrote:
>>>>> 
>>>>> Can anyone point me to some documentation, or just explain to me, how 
>>>>> back pressure is supposed to work.
>>>>> 
>>>>> I am trying to limit the amount of storage used for queued files in my 
>>>>> flow.  To that end I have a connection near the end of the flow that I’ve 
>>>>> put a limit on.  When that limit is reached I assumed that back pressure 
>>>>> would limit the output of the processors all the way back up stream.  I 
>>>>> find that that is not the case and large numbers of files are being 
>>>>> queued in upstream connections.
>>>>> 
>>>>> Given this can someone explain how back pressure can be employed to 
>>>>> achieve my goal of limiting storage usage for in flight files?
>>>>> 
>>>>> Thanks,
>>>>> Chris
>>>> 
>>

Re: Need help understanding backpressure

Reply via email to