I am receiving POSTs from a Pentaho process, delivering files to my NiFi
0.7.x workflow HandleHttpRequest processor. That processor hands the
flowfile off to an ExecuteScript processor that runs a python script. This
script is very, very simple: it takes an incoming JSO object and loads it
into a Python dictionary, and verifies the presence of required fields
using simple has_key checks on the dictionary. There are only eight fields
in the incoming JSON object.

The throughput for these two processes is not exceeding 100-150 files in
five minutes. It seems very slow in light of the minimal processing going
on in these two steps.

I notice that there are configuration operations seemingly related to
optimizing performance. "Concurrent tasks", for example,  is only set by
default to 1 for each processor.

What performance optimizations at the processor level do users recommend?
Is it advisable to crank up the concurrent tasks for a processor, and is
there an optimal performance point beyond which you should not crank up
that value? Are there trade-offs?

I am particularly interested in optimizations for HandleHttpRequest and
ExecuteScript processors.

Thanks in advance for your thoughts.

cheers,

Jim

Reply via email to