Re: Options for increasing performance?

Scott Wagner Wed, 05 Apr 2017 13:26:38 -0700

One of my experiences is that when using ExecuteScript and Python isthat having an ExecuteScript that works on an individual FlowFile whenyou have multiple in the input queue is very inefficient, even when youset it to a timer of 0 sec.


Instead, I have the following in all of my Python scripts:


flowFiles = session.get(10)
for flowFile in flowFiles:
    if flowFile is None:
        continue
    # Do stuff here

That seems to improve the throughput of the ExecuteScript processordramatically.


YMMV

- Scott

James McMahon <mailto:[email protected]>
Wednesday, April 5, 2017 12:48 PM
I am receiving POSTs from a Pentaho process, delivering files to myNiFi 0.7.x workflow HandleHttpRequest processor. That processor handsthe flowfile off to an ExecuteScript processor that runs a pythonscript. This script is very, very simple: it takes an incoming JSOobject and loads it into a Python dictionary, and verifies thepresence of required fields using simple has_key checks on thedictionary. There are only eight fields in the incoming JSON object.
The throughput for these two processes is not exceeding 100-150 filesin five minutes. It seems very slow in light of the minimal processinggoing on in these two steps.
I notice that there are configuration operations seemingly related tooptimizing performance. "Concurrent tasks", for example, is only setby default to 1 for each processor.
What performance optimizations at the processor level do usersrecommend? Is it advisable to crank up the concurrent tasks for aprocessor, and is there an optimal performance point beyond which youshould not crank up that value? Are there trade-offs?
I am particularly interested in optimizations for HandleHttpRequestand ExecuteScript processors.
Thanks in advance for your thoughts.

cheers,

Jim

Re: Options for increasing performance?

Reply via email to