Thank you Joe. where can I go to see further details about optimizing some
of these factors you mention? Adding threads to the controller, for
instance? Adjusting the run duration?  -Jim

On Wed, Apr 5, 2017 at 1:58 PM, Joe Witt <[email protected]> wrote:

> There are a number of things you can do for a given flow to improve
> its performance:
> - Add tasks to the component(s)
> - Adjust scheduling period
> - Add threads to the controller overall
> - Adjust the run duration to tell nifi you are willing to trade small
> latency for higher throughput
>
> Be sure your IO is not bottlenecked.  Use 'iostat' and other tools to
> observe this over time.  Make sure not all repos are on the same
> partition and creating undo contention.
>
> And a whole lot more.
>
> Then there is the inherent performance/design of the flow itself.  In
> this case the thing has to spin up a jython step each time.  This is
> pretty slow I've observed.  I recommend avoiding adding tasks/threads
> until the single task performance makes sense to what you expect for a
> single thread.  Can this flow be brought into NiFi itself using
> available processors?  If not, try Groovy instead as performance for
> that is far higher than jython.
>
> HandleHTTPRequest/Response can be quite quick so if that is the
> bottleneck something is really wrong.
>
> Thanks
> Joe
>
> On Wed, Apr 5, 2017 at 1:52 PM, Matt Burgess <[email protected]> wrote:
> > Jim,
> >
> > One quick thing you can try is to use GenerateFlowFile to send to your
> > ExecuteScript instead of HandleHttpRequest, you can configure it to
> > send whatever body with whatever attributes (such that you would get
> > from HandleHttpRequest) and send files at whatever rate the processor
> > is scheduled. This might take ExecuteScript out of the bottleneck
> > equation; if you are getting plenty of throughput without
> > HandleHttpRequest then that's probably your bottleneck.
> >
> > I'm not sure offhand about optimizations for HandleHttpRequest,
> > perhaps someone else will jump in :)
> >
> > Regards,
> > Matt
> >
> >
> > On Wed, Apr 5, 2017 at 1:48 PM, James McMahon <[email protected]>
> wrote:
> >> I am receiving POSTs from a Pentaho process, delivering files to my NiFi
> >> 0.7.x workflow HandleHttpRequest processor. That processor hands the
> >> flowfile off to an ExecuteScript processor that runs a python script.
> This
> >> script is very, very simple: it takes an incoming JSO object and loads
> it
> >> into a Python dictionary, and verifies the presence of required fields
> using
> >> simple has_key checks on the dictionary. There are only eight fields in
> the
> >> incoming JSON object.
> >>
> >> The throughput for these two processes is not exceeding 100-150 files in
> >> five minutes. It seems very slow in light of the minimal processing
> going on
> >> in these two steps.
> >>
> >> I notice that there are configuration operations seemingly related to
> >> optimizing performance. "Concurrent tasks", for example,  is only set by
> >> default to 1 for each processor.
> >>
> >> What performance optimizations at the processor level do users
> recommend? Is
> >> it advisable to crank up the concurrent tasks for a processor, and is
> there
> >> an optimal performance point beyond which you should not crank up that
> >> value? Are there trade-offs?
> >>
> >> I am particularly interested in optimizations for HandleHttpRequest and
> >> ExecuteScript processors.
> >>
> >> Thanks in advance for your thoughts.
> >>
> >> cheers,
> >>
> >> Jim
>

Reply via email to