Chris,

I like the idea of providing a way to enforce backpressure based on how full 
the content or FlowFIle repository is.
I would imagine that this would be something that we would also configure on a 
connection, just like the other backpressure
is configured, so that we could allow, for example, more "important" or more 
time-sensitive data to come into the flow
even if the repository is 90% full whereas other data may not be allowed to 
enter once the repo hits 60% full, so that we
ensure that we have room for the other data.

Is this what you had in mind?

Thanks
-Mark


> On Apr 15, 2016, at 2:38 PM, McDermott, Chris Kevin (MSDU - 
> STaTS/StorefrontRemote) <[email protected]> wrote:
> 
> Thanks for the clarification and explanation of the design philosophy. It 
> does make sense. I think it comes down to me trying to use back pressure for 
> a purpose for which it was not designed.
> 
> What if there was a way to configure a processor to be paused based on 
> available disk space dropping below some threshold.  That way ingress 
> processors, as identified by the user, could be prevented from flooding the 
> system with too much data.  Thoughts?
> 
> Chris
> 
> 
> 
> 
> On 4/15/16, 1:12 PM, "Mark Payne" <[email protected]> wrote:
> 
>> Chris,
>> 
>> When you apply backpressure to that connection, it will cause the processor 
>> that is
>> the source of the connection to stop being scheduled to run until the queue 
>> clears out.
>> However, as you noted, data will still queue up in that processor's incoming 
>> connections.
>> So to force backpressure to propagate all the way back to the source, you 
>> would
>> need to configure each of the connections in the flow to have backpressure 
>> applied.
>> 
>> The reason behind this is that we can have many different source, each 
>> routing data to
>> many different destinations. So if the queue before a 'terminal processor' 
>> is filled,
>> we won't want to prevent data from coming in from some source if only some 
>> portion of
>> that data will go to that processor.
>> 
>> For example, consider the following flow:
>> 
>> A --> B --> C --> D
>>                 ^
>> E --> F -----|
>>                 v
>>                 G
>> 
>> Where Processor A sends 100% of data to B and then C and D.
>> Maybe only 1% of data from Processor E makes its way to D, though,
>> and 99% of its data goes to G instead.
>> 
>> If the queue from C to D fills up, we may not want to stop the data flowing
>> in from E because most of its data is going to G. Or we may want to stop data
>> coming in from E only if the queue from F to C backs up to say 100,000 
>> FlowFiles.
>> 
>> By ensuring that backpressure is applied only to that one connection, we can 
>> leverage
>> this to control which sources stop bringing in data when.
>> 
>> Hopefully this provided some clarification of how this works and why it was 
>> done this way
>> rather than confusing you more :)
>> 
>> However, I can see the benefit in setting a backpressure threshold only 
>> once. And I think
>> there are a couple of possible improvements here:
>> 
>> (1) We could allow the user to select multiple connections and then 
>> configure backpressure 
>> and have that applied to all selected connections.
>> 
>> (2) We could allow the user to set the backpressure and indicate that it 
>> should be propagated back
>> to all upstream connections.  This feels a little more dangerous, though, 
>> because it would be easy
>> to change configurations inadvertently.
>> 
>> Hopefully this help!
>> 
>> Thanks
>> -Mark
>> 
>> 
>> 
>>> On Apr 15, 2016, at 12:52 PM, McDermott, Chris Kevin (MSDU - 
>>> STaTS/StorefrontRemote) <[email protected]> wrote:
>>> 
>>> Can anyone point me to some documentation, or just explain to me, how back 
>>> pressure is supposed to work.
>>> 
>>> I am trying to limit the amount of storage used for queued files in my 
>>> flow.  To that end I have a connection near the end of the flow that I’ve 
>>> put a limit on.  When that limit is reached I assumed that back pressure 
>>> would limit the output of the processors all the way back up stream.  I 
>>> find that that is not the case and large numbers of files are being queued 
>>> in upstream connections.
>>> 
>>> Given this can someone explain how back pressure can be employed to achieve 
>>> my goal of limiting storage usage for in flight files?
>>> 
>>> Thanks,
>>> Chris
>> 

Reply via email to