Thanks for the clarification and explanation of the design philosophy. It does make sense. I think it comes down to me trying to use back pressure for a purpose for which it was not designed.
What if there was a way to configure a processor to be paused based on available disk space dropping below some threshold. That way ingress processors, as identified by the user, could be prevented from flooding the system with too much data. Thoughts? Chris On 4/15/16, 1:12 PM, "Mark Payne" <[email protected]> wrote: >Chris, > >When you apply backpressure to that connection, it will cause the processor >that is >the source of the connection to stop being scheduled to run until the queue >clears out. >However, as you noted, data will still queue up in that processor's incoming >connections. >So to force backpressure to propagate all the way back to the source, you would >need to configure each of the connections in the flow to have backpressure >applied. > >The reason behind this is that we can have many different source, each routing >data to >many different destinations. So if the queue before a 'terminal processor' is >filled, >we won't want to prevent data from coming in from some source if only some >portion of >that data will go to that processor. > >For example, consider the following flow: > >A --> B --> C --> D > ^ >E --> F -----| > v > G > >Where Processor A sends 100% of data to B and then C and D. >Maybe only 1% of data from Processor E makes its way to D, though, >and 99% of its data goes to G instead. > >If the queue from C to D fills up, we may not want to stop the data flowing >in from E because most of its data is going to G. Or we may want to stop data >coming in from E only if the queue from F to C backs up to say 100,000 >FlowFiles. > >By ensuring that backpressure is applied only to that one connection, we can >leverage >this to control which sources stop bringing in data when. > >Hopefully this provided some clarification of how this works and why it was >done this way >rather than confusing you more :) > >However, I can see the benefit in setting a backpressure threshold only once. >And I think >there are a couple of possible improvements here: > >(1) We could allow the user to select multiple connections and then configure >backpressure >and have that applied to all selected connections. > >(2) We could allow the user to set the backpressure and indicate that it >should be propagated back >to all upstream connections. This feels a little more dangerous, though, >because it would be easy >to change configurations inadvertently. > >Hopefully this help! > >Thanks >-Mark > > > >> On Apr 15, 2016, at 12:52 PM, McDermott, Chris Kevin (MSDU - >> STaTS/StorefrontRemote) <[email protected]> wrote: >> >> Can anyone point me to some documentation, or just explain to me, how back >> pressure is supposed to work. >> >> I am trying to limit the amount of storage used for queued files in my flow. >> To that end I have a connection near the end of the flow that I’ve put a >> limit on. When that limit is reached I assumed that back pressure would >> limit the output of the processors all the way back up stream. I find that >> that is not the case and large numbers of files are being queued in upstream >> connections. >> >> Given this can someone explain how back pressure can be employed to achieve >> my goal of limiting storage usage for in flight files? >> >> Thanks, >> Chris >
