It looks like the way I think about it might be a bit off base. :)

On Fri, Apr 7, 2017 at 2:31 PM Joe Witt <[email protected]> wrote:

> The concept of run duration there is one of the ways we allow users to
> hint to the framework what their preference is.  In general all users
> want the thing to 'go fast'.  But what 'fast' means for you is
> throughput and what fast means for someone else is low latency.
>
> What this really means under the covers at this point is that for
> processors which are willing to delegate the responsibility of 'when
> to commit what they've done in a transactional sense' to the framework
> then the framework can use that knowledge to automatically combine one
> or more transactions into a single transaction.  This has the effect
> of trading off some very small latency for what is arguably higher
> throughput because what that means is we can do a single write to our
> flowfile repository instead of many.  This reduces burden on various
> locks, the file system/interrupts, etc..  It is in general just a bit
> more friendly and does indeed have the effect of higher throughput.
>
> Now, with regard to what should be the default value we cannot really
> know whether one prefers, generically speaking, to have the system
> operate more latency sensitive or more throughput sensitive.  Further,
> it isn't really that tight of a relationship.  Also, consider that in
> a given NiFi cluster it can have and handle flows from numerous teams
> and organizations at the same time.  Each with its own needs and
> interests and preferences.  So, we allow it to be selected.
>
> As to the question about some processors supporting it and some not
> the reason for this is simply that sometimes the processor cannot and
> is not willing to let the framework choose when to commit the session.
> Why?  Because they might have operations which are not 'side effect
> free' meaning once they've done something the environment has been
> altered in ways that cannot be recovered from.  Take for example a
> processor which sends data via SFTP.  Once a given file is sent we
> cannot 'unsend it' nor can we simply repeat that process without a
> side effect.  By allowing the framework to handle it for the processor
> the point is that the operation can be easily undone/redone within the
> confines of NiFi and not have changed some external system state.  So,
> this is a really important thing to appreciate.
>
> Thanks
> Joe
>
> On Fri, Apr 7, 2017 at 2:18 PM, Jeff <[email protected]> wrote:
> > James,
> >
> > The way I look at it (abstractly speaking) is that the slider represents
> how
> > long a processor will be able to use a thread to work on flowfiles (from
> its
> > inbound queue, allowing onTrigger to run more times to generate more
> > outbound flowfiles, etc).  Moving that slider towards higher throughput,
> the
> > processor will do more work, but will hog that thread for a longer
> period of
> > time before another processor can use it.  So, overall latency could go
> > down, because flowfiles will sit in other queues for possibly longer
> periods
> > of time before another processor gets a thread to start doing work, but
> that
> > particular processor will probably see higher throughput.
> >
> > That's in pretty general terms, though.
> >
> > On Fri, Apr 7, 2017 at 9:49 AM James McMahon <[email protected]>
> wrote:
> >>
> >> I see that some processors provide a slider to set a balance between
> >> Latency and Throughput. Not all processors provide this, but some do.
> They
> >> seem to be inversely related.
> >>
> >> I also notice that the default appears to be Lower latency, implying
> also
> >> lower throughput. Why is that the default? I would think that being a
> >> workflow, maximizing throughput would be the ultimate goal. Yet it seems
> >> that the processors opt for defaults to lowest latency, lowest
> throughput.
> >>
> >> What is the relationship between Latency and Throughput? Do most folks
> in
> >> the user group typically go in and change that to Highest on
> throughput? Is
> >> that something to avoid because of demands on CPU, RAM, and disk IO?
> >>
> >> Thanks very much. -Jim
>

Reply via email to