Jim, One quick thing you can try is to use GenerateFlowFile to send to your ExecuteScript instead of HandleHttpRequest, you can configure it to send whatever body with whatever attributes (such that you would get from HandleHttpRequest) and send files at whatever rate the processor is scheduled. This might take ExecuteScript out of the bottleneck equation; if you are getting plenty of throughput without HandleHttpRequest then that's probably your bottleneck.
I'm not sure offhand about optimizations for HandleHttpRequest, perhaps someone else will jump in :) Regards, Matt On Wed, Apr 5, 2017 at 1:48 PM, James McMahon <[email protected]> wrote: > I am receiving POSTs from a Pentaho process, delivering files to my NiFi > 0.7.x workflow HandleHttpRequest processor. That processor hands the > flowfile off to an ExecuteScript processor that runs a python script. This > script is very, very simple: it takes an incoming JSO object and loads it > into a Python dictionary, and verifies the presence of required fields using > simple has_key checks on the dictionary. There are only eight fields in the > incoming JSON object. > > The throughput for these two processes is not exceeding 100-150 files in > five minutes. It seems very slow in light of the minimal processing going on > in these two steps. > > I notice that there are configuration operations seemingly related to > optimizing performance. "Concurrent tasks", for example, is only set by > default to 1 for each processor. > > What performance optimizations at the processor level do users recommend? Is > it advisable to crank up the concurrent tasks for a processor, and is there > an optimal performance point beyond which you should not crank up that > value? Are there trade-offs? > > I am particularly interested in optimizations for HandleHttpRequest and > ExecuteScript processors. > > Thanks in advance for your thoughts. > > cheers, > > Jim
