Another thing to consider is whether the bottleneck is in NiFi or before it
gets there.  Is the source of data capable of making post requests more
quickly than that as configured? Is network latency or throughput a
limitation?  You might try a another http server to see whether the problem
is within NiFi.

E.g. modify something like https://gist.github.com/bradmontgomery/2219997
to log requests and see if the rate is similar even when no other
processing is done on the server side.

Thanks,
Bryan

On Wed, Apr 5, 2017 at 1:58 PM, Joe Witt <[email protected]> wrote:

> There are a number of things you can do for a given flow to improve
> its performance:
> - Add tasks to the component(s)
> - Adjust scheduling period
> - Add threads to the controller overall
> - Adjust the run duration to tell nifi you are willing to trade small
> latency for higher throughput
>
> Be sure your IO is not bottlenecked.  Use 'iostat' and other tools to
> observe this over time.  Make sure not all repos are on the same
> partition and creating undo contention.
>
> And a whole lot more.
>
> Then there is the inherent performance/design of the flow itself.  In
> this case the thing has to spin up a jython step each time.  This is
> pretty slow I've observed.  I recommend avoiding adding tasks/threads
> until the single task performance makes sense to what you expect for a
> single thread.  Can this flow be brought into NiFi itself using
> available processors?  If not, try Groovy instead as performance for
> that is far higher than jython.
>
> HandleHTTPRequest/Response can be quite quick so if that is the
> bottleneck something is really wrong.
>
> Thanks
> Joe
>
> On Wed, Apr 5, 2017 at 1:52 PM, Matt Burgess <[email protected]> wrote:
> > Jim,
> >
> > One quick thing you can try is to use GenerateFlowFile to send to your
> > ExecuteScript instead of HandleHttpRequest, you can configure it to
> > send whatever body with whatever attributes (such that you would get
> > from HandleHttpRequest) and send files at whatever rate the processor
> > is scheduled. This might take ExecuteScript out of the bottleneck
> > equation; if you are getting plenty of throughput without
> > HandleHttpRequest then that's probably your bottleneck.
> >
> > I'm not sure offhand about optimizations for HandleHttpRequest,
> > perhaps someone else will jump in :)
> >
> > Regards,
> > Matt
> >
> >
> > On Wed, Apr 5, 2017 at 1:48 PM, James McMahon <[email protected]>
> wrote:
> >> I am receiving POSTs from a Pentaho process, delivering files to my NiFi
> >> 0.7.x workflow HandleHttpRequest processor. That processor hands the
> >> flowfile off to an ExecuteScript processor that runs a python script.
> This
> >> script is very, very simple: it takes an incoming JSO object and loads
> it
> >> into a Python dictionary, and verifies the presence of required fields
> using
> >> simple has_key checks on the dictionary. There are only eight fields in
> the
> >> incoming JSON object.
> >>
> >> The throughput for these two processes is not exceeding 100-150 files in
> >> five minutes. It seems very slow in light of the minimal processing
> going on
> >> in these two steps.
> >>
> >> I notice that there are configuration operations seemingly related to
> >> optimizing performance. "Concurrent tasks", for example,  is only set by
> >> default to 1 for each processor.
> >>
> >> What performance optimizations at the processor level do users
> recommend? Is
> >> it advisable to crank up the concurrent tasks for a processor, and is
> there
> >> an optimal performance point beyond which you should not crank up that
> >> value? Are there trade-offs?
> >>
> >> I am particularly interested in optimizations for HandleHttpRequest and
> >> ExecuteScript processors.
> >>
> >> Thanks in advance for your thoughts.
> >>
> >> cheers,
> >>
> >> Jim
>

Reply via email to