There are a number of things you can do for a given flow to improve
its performance:
- Add tasks to the component(s)
- Adjust scheduling period
- Add threads to the controller overall
- Adjust the run duration to tell nifi you are willing to trade small
latency for higher throughput

Be sure your IO is not bottlenecked.  Use 'iostat' and other tools to
observe this over time.  Make sure not all repos are on the same
partition and creating undo contention.

And a whole lot more.

Then there is the inherent performance/design of the flow itself.  In
this case the thing has to spin up a jython step each time.  This is
pretty slow I've observed.  I recommend avoiding adding tasks/threads
until the single task performance makes sense to what you expect for a
single thread.  Can this flow be brought into NiFi itself using
available processors?  If not, try Groovy instead as performance for
that is far higher than jython.

HandleHTTPRequest/Response can be quite quick so if that is the
bottleneck something is really wrong.

Thanks
Joe

On Wed, Apr 5, 2017 at 1:52 PM, Matt Burgess <[email protected]> wrote:
> Jim,
>
> One quick thing you can try is to use GenerateFlowFile to send to your
> ExecuteScript instead of HandleHttpRequest, you can configure it to
> send whatever body with whatever attributes (such that you would get
> from HandleHttpRequest) and send files at whatever rate the processor
> is scheduled. This might take ExecuteScript out of the bottleneck
> equation; if you are getting plenty of throughput without
> HandleHttpRequest then that's probably your bottleneck.
>
> I'm not sure offhand about optimizations for HandleHttpRequest,
> perhaps someone else will jump in :)
>
> Regards,
> Matt
>
>
> On Wed, Apr 5, 2017 at 1:48 PM, James McMahon <[email protected]> wrote:
>> I am receiving POSTs from a Pentaho process, delivering files to my NiFi
>> 0.7.x workflow HandleHttpRequest processor. That processor hands the
>> flowfile off to an ExecuteScript processor that runs a python script. This
>> script is very, very simple: it takes an incoming JSO object and loads it
>> into a Python dictionary, and verifies the presence of required fields using
>> simple has_key checks on the dictionary. There are only eight fields in the
>> incoming JSON object.
>>
>> The throughput for these two processes is not exceeding 100-150 files in
>> five minutes. It seems very slow in light of the minimal processing going on
>> in these two steps.
>>
>> I notice that there are configuration operations seemingly related to
>> optimizing performance. "Concurrent tasks", for example,  is only set by
>> default to 1 for each processor.
>>
>> What performance optimizations at the processor level do users recommend? Is
>> it advisable to crank up the concurrent tasks for a processor, and is there
>> an optimal performance point beyond which you should not crank up that
>> value? Are there trade-offs?
>>
>> I am particularly interested in optimizations for HandleHttpRequest and
>> ExecuteScript processors.
>>
>> Thanks in advance for your thoughts.
>>
>> cheers,
>>
>> Jim

Reply via email to