I am receiving POSTs from a Pentaho process, delivering files to my NiFi 0.7.x workflow HandleHttpRequest processor. That processor hands the flowfile off to an ExecuteScript processor that runs a python script. This script is very, very simple: it takes an incoming JSO object and loads it into a Python dictionary, and verifies the presence of required fields using simple has_key checks on the dictionary. There are only eight fields in the incoming JSON object.
The throughput for these two processes is not exceeding 100-150 files in five minutes. It seems very slow in light of the minimal processing going on in these two steps. I notice that there are configuration operations seemingly related to optimizing performance. "Concurrent tasks", for example, is only set by default to 1 for each processor. What performance optimizations at the processor level do users recommend? Is it advisable to crank up the concurrent tasks for a processor, and is there an optimal performance point beyond which you should not crank up that value? Are there trade-offs? I am particularly interested in optimizations for HandleHttpRequest and ExecuteScript processors. Thanks in advance for your thoughts. cheers, Jim
