Chris, I did create a ticket for this [1].
Just wanted to reply to the email to close the loop on this. Feel free to update with any other thoughts or ideas that you may have. Thanks -Mark [1] https://issues.apache.org/jira/browse/NIFI-1793 <https://issues.apache.org/jira/browse/NIFI-1793> > On Apr 19, 2016, at 11:51 AM, McDermott, Chris Kevin (MSDU - > STaTS/StorefrontRemote) <[email protected]> wrote: > > Not exactly what I was thinking, but this is better! > > Thanks, > Chris > > > > > On 4/19/16, 8:44 AM, "Mark Payne" <[email protected]> wrote: > >> Chris, >> >> I like the idea of providing a way to enforce backpressure based on how full >> the content or FlowFIle repository is. >> I would imagine that this would be something that we would also configure on >> a connection, just like the other backpressure >> is configured, so that we could allow, for example, more "important" or more >> time-sensitive data to come into the flow >> even if the repository is 90% full whereas other data may not be allowed to >> enter once the repo hits 60% full, so that we >> ensure that we have room for the other data. >> >> Is this what you had in mind? >> >> Thanks >> -Mark >> >> >>> On Apr 15, 2016, at 2:38 PM, McDermott, Chris Kevin (MSDU - >>> STaTS/StorefrontRemote) <[email protected]> wrote: >>> >>> Thanks for the clarification and explanation of the design philosophy. It >>> does make sense. I think it comes down to me trying to use back pressure >>> for a purpose for which it was not designed. >>> >>> What if there was a way to configure a processor to be paused based on >>> available disk space dropping below some threshold. That way ingress >>> processors, as identified by the user, could be prevented from flooding the >>> system with too much data. Thoughts? >>> >>> Chris >>> >>> >>> >>> >>> On 4/15/16, 1:12 PM, "Mark Payne" <[email protected]> wrote: >>> >>>> Chris, >>>> >>>> When you apply backpressure to that connection, it will cause the >>>> processor that is >>>> the source of the connection to stop being scheduled to run until the >>>> queue clears out. >>>> However, as you noted, data will still queue up in that processor's >>>> incoming connections. >>>> So to force backpressure to propagate all the way back to the source, you >>>> would >>>> need to configure each of the connections in the flow to have backpressure >>>> applied. >>>> >>>> The reason behind this is that we can have many different source, each >>>> routing data to >>>> many different destinations. So if the queue before a 'terminal processor' >>>> is filled, >>>> we won't want to prevent data from coming in from some source if only some >>>> portion of >>>> that data will go to that processor. >>>> >>>> For example, consider the following flow: >>>> >>>> A --> B --> C --> D >>>> ^ >>>> E --> F -----| >>>> v >>>> G >>>> >>>> Where Processor A sends 100% of data to B and then C and D. >>>> Maybe only 1% of data from Processor E makes its way to D, though, >>>> and 99% of its data goes to G instead. >>>> >>>> If the queue from C to D fills up, we may not want to stop the data flowing >>>> in from E because most of its data is going to G. Or we may want to stop >>>> data >>>> coming in from E only if the queue from F to C backs up to say 100,000 >>>> FlowFiles. >>>> >>>> By ensuring that backpressure is applied only to that one connection, we >>>> can leverage >>>> this to control which sources stop bringing in data when. >>>> >>>> Hopefully this provided some clarification of how this works and why it >>>> was done this way >>>> rather than confusing you more :) >>>> >>>> However, I can see the benefit in setting a backpressure threshold only >>>> once. And I think >>>> there are a couple of possible improvements here: >>>> >>>> (1) We could allow the user to select multiple connections and then >>>> configure backpressure >>>> and have that applied to all selected connections. >>>> >>>> (2) We could allow the user to set the backpressure and indicate that it >>>> should be propagated back >>>> to all upstream connections. This feels a little more dangerous, though, >>>> because it would be easy >>>> to change configurations inadvertently. >>>> >>>> Hopefully this help! >>>> >>>> Thanks >>>> -Mark >>>> >>>> >>>> >>>>> On Apr 15, 2016, at 12:52 PM, McDermott, Chris Kevin (MSDU - >>>>> STaTS/StorefrontRemote) <[email protected]> wrote: >>>>> >>>>> Can anyone point me to some documentation, or just explain to me, how >>>>> back pressure is supposed to work. >>>>> >>>>> I am trying to limit the amount of storage used for queued files in my >>>>> flow. To that end I have a connection near the end of the flow that I’ve >>>>> put a limit on. When that limit is reached I assumed that back pressure >>>>> would limit the output of the processors all the way back up stream. I >>>>> find that that is not the case and large numbers of files are being >>>>> queued in upstream connections. >>>>> >>>>> Given this can someone explain how back pressure can be employed to >>>>> achieve my goal of limiting storage usage for in flight files? >>>>> >>>>> Thanks, >>>>> Chris >>>> >>
