It looks like the way I think about it might be a bit off base. :) On Fri, Apr 7, 2017 at 2:31 PM Joe Witt <[email protected]> wrote:
> The concept of run duration there is one of the ways we allow users to > hint to the framework what their preference is. In general all users > want the thing to 'go fast'. But what 'fast' means for you is > throughput and what fast means for someone else is low latency. > > What this really means under the covers at this point is that for > processors which are willing to delegate the responsibility of 'when > to commit what they've done in a transactional sense' to the framework > then the framework can use that knowledge to automatically combine one > or more transactions into a single transaction. This has the effect > of trading off some very small latency for what is arguably higher > throughput because what that means is we can do a single write to our > flowfile repository instead of many. This reduces burden on various > locks, the file system/interrupts, etc.. It is in general just a bit > more friendly and does indeed have the effect of higher throughput. > > Now, with regard to what should be the default value we cannot really > know whether one prefers, generically speaking, to have the system > operate more latency sensitive or more throughput sensitive. Further, > it isn't really that tight of a relationship. Also, consider that in > a given NiFi cluster it can have and handle flows from numerous teams > and organizations at the same time. Each with its own needs and > interests and preferences. So, we allow it to be selected. > > As to the question about some processors supporting it and some not > the reason for this is simply that sometimes the processor cannot and > is not willing to let the framework choose when to commit the session. > Why? Because they might have operations which are not 'side effect > free' meaning once they've done something the environment has been > altered in ways that cannot be recovered from. Take for example a > processor which sends data via SFTP. Once a given file is sent we > cannot 'unsend it' nor can we simply repeat that process without a > side effect. By allowing the framework to handle it for the processor > the point is that the operation can be easily undone/redone within the > confines of NiFi and not have changed some external system state. So, > this is a really important thing to appreciate. > > Thanks > Joe > > On Fri, Apr 7, 2017 at 2:18 PM, Jeff <[email protected]> wrote: > > James, > > > > The way I look at it (abstractly speaking) is that the slider represents > how > > long a processor will be able to use a thread to work on flowfiles (from > its > > inbound queue, allowing onTrigger to run more times to generate more > > outbound flowfiles, etc). Moving that slider towards higher throughput, > the > > processor will do more work, but will hog that thread for a longer > period of > > time before another processor can use it. So, overall latency could go > > down, because flowfiles will sit in other queues for possibly longer > periods > > of time before another processor gets a thread to start doing work, but > that > > particular processor will probably see higher throughput. > > > > That's in pretty general terms, though. > > > > On Fri, Apr 7, 2017 at 9:49 AM James McMahon <[email protected]> > wrote: > >> > >> I see that some processors provide a slider to set a balance between > >> Latency and Throughput. Not all processors provide this, but some do. > They > >> seem to be inversely related. > >> > >> I also notice that the default appears to be Lower latency, implying > also > >> lower throughput. Why is that the default? I would think that being a > >> workflow, maximizing throughput would be the ultimate goal. Yet it seems > >> that the processors opt for defaults to lowest latency, lowest > throughput. > >> > >> What is the relationship between Latency and Throughput? Do most folks > in > >> the user group typically go in and change that to Highest on > throughput? Is > >> that something to avoid because of demands on CPU, RAM, and disk IO? > >> > >> Thanks very much. -Jim >
