Thank you Joe. where can I go to see further details about optimizing some of these factors you mention? Adding threads to the controller, for instance? Adjusting the run duration? -Jim
On Wed, Apr 5, 2017 at 1:58 PM, Joe Witt <[email protected]> wrote: > There are a number of things you can do for a given flow to improve > its performance: > - Add tasks to the component(s) > - Adjust scheduling period > - Add threads to the controller overall > - Adjust the run duration to tell nifi you are willing to trade small > latency for higher throughput > > Be sure your IO is not bottlenecked. Use 'iostat' and other tools to > observe this over time. Make sure not all repos are on the same > partition and creating undo contention. > > And a whole lot more. > > Then there is the inherent performance/design of the flow itself. In > this case the thing has to spin up a jython step each time. This is > pretty slow I've observed. I recommend avoiding adding tasks/threads > until the single task performance makes sense to what you expect for a > single thread. Can this flow be brought into NiFi itself using > available processors? If not, try Groovy instead as performance for > that is far higher than jython. > > HandleHTTPRequest/Response can be quite quick so if that is the > bottleneck something is really wrong. > > Thanks > Joe > > On Wed, Apr 5, 2017 at 1:52 PM, Matt Burgess <[email protected]> wrote: > > Jim, > > > > One quick thing you can try is to use GenerateFlowFile to send to your > > ExecuteScript instead of HandleHttpRequest, you can configure it to > > send whatever body with whatever attributes (such that you would get > > from HandleHttpRequest) and send files at whatever rate the processor > > is scheduled. This might take ExecuteScript out of the bottleneck > > equation; if you are getting plenty of throughput without > > HandleHttpRequest then that's probably your bottleneck. > > > > I'm not sure offhand about optimizations for HandleHttpRequest, > > perhaps someone else will jump in :) > > > > Regards, > > Matt > > > > > > On Wed, Apr 5, 2017 at 1:48 PM, James McMahon <[email protected]> > wrote: > >> I am receiving POSTs from a Pentaho process, delivering files to my NiFi > >> 0.7.x workflow HandleHttpRequest processor. That processor hands the > >> flowfile off to an ExecuteScript processor that runs a python script. > This > >> script is very, very simple: it takes an incoming JSO object and loads > it > >> into a Python dictionary, and verifies the presence of required fields > using > >> simple has_key checks on the dictionary. There are only eight fields in > the > >> incoming JSON object. > >> > >> The throughput for these two processes is not exceeding 100-150 files in > >> five minutes. It seems very slow in light of the minimal processing > going on > >> in these two steps. > >> > >> I notice that there are configuration operations seemingly related to > >> optimizing performance. "Concurrent tasks", for example, is only set by > >> default to 1 for each processor. > >> > >> What performance optimizations at the processor level do users > recommend? Is > >> it advisable to crank up the concurrent tasks for a processor, and is > there > >> an optimal performance point beyond which you should not crank up that > >> value? Are there trade-offs? > >> > >> I am particularly interested in optimizations for HandleHttpRequest and > >> ExecuteScript processors. > >> > >> Thanks in advance for your thoughts. > >> > >> cheers, > >> > >> Jim >
