Chris, I like the idea of providing a way to enforce backpressure based on how full the content or FlowFIle repository is. I would imagine that this would be something that we would also configure on a connection, just like the other backpressure is configured, so that we could allow, for example, more "important" or more time-sensitive data to come into the flow even if the repository is 90% full whereas other data may not be allowed to enter once the repo hits 60% full, so that we ensure that we have room for the other data.
Is this what you had in mind? Thanks -Mark > On Apr 15, 2016, at 2:38 PM, McDermott, Chris Kevin (MSDU - > STaTS/StorefrontRemote) <[email protected]> wrote: > > Thanks for the clarification and explanation of the design philosophy. It > does make sense. I think it comes down to me trying to use back pressure for > a purpose for which it was not designed. > > What if there was a way to configure a processor to be paused based on > available disk space dropping below some threshold. That way ingress > processors, as identified by the user, could be prevented from flooding the > system with too much data. Thoughts? > > Chris > > > > > On 4/15/16, 1:12 PM, "Mark Payne" <[email protected]> wrote: > >> Chris, >> >> When you apply backpressure to that connection, it will cause the processor >> that is >> the source of the connection to stop being scheduled to run until the queue >> clears out. >> However, as you noted, data will still queue up in that processor's incoming >> connections. >> So to force backpressure to propagate all the way back to the source, you >> would >> need to configure each of the connections in the flow to have backpressure >> applied. >> >> The reason behind this is that we can have many different source, each >> routing data to >> many different destinations. So if the queue before a 'terminal processor' >> is filled, >> we won't want to prevent data from coming in from some source if only some >> portion of >> that data will go to that processor. >> >> For example, consider the following flow: >> >> A --> B --> C --> D >> ^ >> E --> F -----| >> v >> G >> >> Where Processor A sends 100% of data to B and then C and D. >> Maybe only 1% of data from Processor E makes its way to D, though, >> and 99% of its data goes to G instead. >> >> If the queue from C to D fills up, we may not want to stop the data flowing >> in from E because most of its data is going to G. Or we may want to stop data >> coming in from E only if the queue from F to C backs up to say 100,000 >> FlowFiles. >> >> By ensuring that backpressure is applied only to that one connection, we can >> leverage >> this to control which sources stop bringing in data when. >> >> Hopefully this provided some clarification of how this works and why it was >> done this way >> rather than confusing you more :) >> >> However, I can see the benefit in setting a backpressure threshold only >> once. And I think >> there are a couple of possible improvements here: >> >> (1) We could allow the user to select multiple connections and then >> configure backpressure >> and have that applied to all selected connections. >> >> (2) We could allow the user to set the backpressure and indicate that it >> should be propagated back >> to all upstream connections. This feels a little more dangerous, though, >> because it would be easy >> to change configurations inadvertently. >> >> Hopefully this help! >> >> Thanks >> -Mark >> >> >> >>> On Apr 15, 2016, at 12:52 PM, McDermott, Chris Kevin (MSDU - >>> STaTS/StorefrontRemote) <[email protected]> wrote: >>> >>> Can anyone point me to some documentation, or just explain to me, how back >>> pressure is supposed to work. >>> >>> I am trying to limit the amount of storage used for queued files in my >>> flow. To that end I have a connection near the end of the flow that I’ve >>> put a limit on. When that limit is reached I assumed that back pressure >>> would limit the output of the processors all the way back up stream. I >>> find that that is not the case and large numbers of files are being queued >>> in upstream connections. >>> >>> Given this can someone explain how back pressure can be employed to achieve >>> my goal of limiting storage usage for in flight files? >>> >>> Thanks, >>> Chris >>
