The concept of run duration there is one of the ways we allow users to hint to the framework what their preference is. In general all users want the thing to 'go fast'. But what 'fast' means for you is throughput and what fast means for someone else is low latency.
What this really means under the covers at this point is that for processors which are willing to delegate the responsibility of 'when to commit what they've done in a transactional sense' to the framework then the framework can use that knowledge to automatically combine one or more transactions into a single transaction. This has the effect of trading off some very small latency for what is arguably higher throughput because what that means is we can do a single write to our flowfile repository instead of many. This reduces burden on various locks, the file system/interrupts, etc.. It is in general just a bit more friendly and does indeed have the effect of higher throughput. Now, with regard to what should be the default value we cannot really know whether one prefers, generically speaking, to have the system operate more latency sensitive or more throughput sensitive. Further, it isn't really that tight of a relationship. Also, consider that in a given NiFi cluster it can have and handle flows from numerous teams and organizations at the same time. Each with its own needs and interests and preferences. So, we allow it to be selected. As to the question about some processors supporting it and some not the reason for this is simply that sometimes the processor cannot and is not willing to let the framework choose when to commit the session. Why? Because they might have operations which are not 'side effect free' meaning once they've done something the environment has been altered in ways that cannot be recovered from. Take for example a processor which sends data via SFTP. Once a given file is sent we cannot 'unsend it' nor can we simply repeat that process without a side effect. By allowing the framework to handle it for the processor the point is that the operation can be easily undone/redone within the confines of NiFi and not have changed some external system state. So, this is a really important thing to appreciate. Thanks Joe On Fri, Apr 7, 2017 at 2:18 PM, Jeff <[email protected]> wrote: > James, > > The way I look at it (abstractly speaking) is that the slider represents how > long a processor will be able to use a thread to work on flowfiles (from its > inbound queue, allowing onTrigger to run more times to generate more > outbound flowfiles, etc). Moving that slider towards higher throughput, the > processor will do more work, but will hog that thread for a longer period of > time before another processor can use it. So, overall latency could go > down, because flowfiles will sit in other queues for possibly longer periods > of time before another processor gets a thread to start doing work, but that > particular processor will probably see higher throughput. > > That's in pretty general terms, though. > > On Fri, Apr 7, 2017 at 9:49 AM James McMahon <[email protected]> wrote: >> >> I see that some processors provide a slider to set a balance between >> Latency and Throughput. Not all processors provide this, but some do. They >> seem to be inversely related. >> >> I also notice that the default appears to be Lower latency, implying also >> lower throughput. Why is that the default? I would think that being a >> workflow, maximizing throughput would be the ultimate goal. Yet it seems >> that the processors opt for defaults to lowest latency, lowest throughput. >> >> What is the relationship between Latency and Throughput? Do most folks in >> the user group typically go in and change that to Highest on throughput? Is >> that something to avoid because of demands on CPU, RAM, and disk IO? >> >> Thanks very much. -Jim
