Thank you Bryan. I will explore these things. I suspect we are not receiving from the source optimally. Reason I say that is this: I am doing manual refreshes on my flow page every 3 to 4 seconds. Frequently I go through 3 or 4 refreshes, and not figures change in my queues nor in my processors. Seems like my workflow is just sitting there waiting for new arrivals. I am using ports 8446 and 9448 (I have two HandleHttpRequest processors now). Anyone know fo a few commands I can use to monitor arrivals at my port of incoming POSTs? Is this something I can monitor using the FF developer features? -Jim
On Wed, Apr 5, 2017 at 3:39 PM, Bryan Rosander <[email protected]> wrote: > This seems to have gotten lost in the chain, resending (please disregard > if you've already read/tried it): > > Another thing to consider is whether the bottleneck is in NiFi or before > it gets there. Is the source of data capable of making post requests more > quickly than that as configured? Is network latency or throughput a > limitation? You might try posting to another http server to see whether > the problem is within NiFi. > > E.g. modify something like https://gist.github.com/bradmontgomery/2219997 to > log requests and see if the rate is similar even when no other processing > is done on the server side. > > If you go with the python server, you may want to use the threading mixin > as well. > > http://stackoverflow.com/questions/14088294/multithreaded- > web-server-in-python > > Thanks, > Bryan > > On Wed, Apr 5, 2017 at 3:19 PM, James McMahon <[email protected]> > wrote: > >> We are not seeing 503s. We have tried setting up a second >> HandleHttpRequest, watching a different port, and "round robin`ing" to the >> two ports. We made a relatively low gain from abut 5 minutes for 100 files >> consistently to 4:40 for 100. I watch my workflow, and at no point does a >> large number of flowfiles queue up in any queue leading into or coming out >> of any processor. >> >> On Wed, Apr 5, 2017 at 2:44 PM, Bryan Rosander <[email protected]> >> wrote: >> >>> It looks like HandleHttpRequest should be sending back a 503 if its >>> containerQueue fills up (default capacity of 50 requests that have been >>> accepted but not processed in an onTrigger()) [1]. Also, the default >>> thread pool the jetty server is using should be able to create up to 200 >>> threads to accept connections and the handler is using an async context so >>> the in-flight flow files shouldn't be holding up new requests. >>> >>> If you're not seeing 503s it might be on the sender side of the >>> equation. Is the sender doing posts concurrently or waiting on each to >>> complete before sending another? >>> >>> [1] https://github.com/apache/nifi/blob/rel/nifi-0.7.0/nifi- >>> nar-bundles/nifi-standard-bundle/nifi-standard-processors/sr >>> c/main/java/org/apache/nifi/processors/standard/HandleHttpRe >>> quest.java#L395 >>> >>> On Wed, Apr 5, 2017 at 2:27 PM, Joe Witt <[email protected]> wrote: >>> >>>> Much of this goodness can be found in the help->Users Guide. >>>> Adjusting run durection/scheduling factors: >>>> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html# >>>> scheduling-tab >>>> >>>> These are the latest docs but I'm sure there is coverage in the older >>>> stuff. >>>> >>>> Thanks >>>> >>>> On Wed, Apr 5, 2017 at 2:23 PM, James McMahon <[email protected]> >>>> wrote: >>>> > Yes sir! Sure am. And I know, because I have committed that very silly >>>> > mistake before. We are indeed seeing # responses = # requests -Jim >>>> > >>>> > On Wed, Apr 5, 2017 at 2:13 PM, Bryan Rosander <[email protected]> >>>> wrote: >>>> >> >>>> >> Hey James, >>>> >> >>>> >> Are you making sure that every route from HandleHttpRequest goes to a >>>> >> HandleHttpResponse? If not, the StandardHttpContextMap may be >>>> filling up >>>> >> with requests which would probably delay processing. >>>> >> >>>> >> Thanks, >>>> >> Bryan >>>> >> >>>> >> On Wed, Apr 5, 2017 at 2:07 PM, James McMahon <[email protected]> >>>> >> wrote: >>>> >>> >>>> >>> Thank you very much Matt. I have cranked my Concurrent Tasks config >>>> parm >>>> >>> on my ExecuteScripts up to 20, and judging by the empty queue >>>> feeding that >>>> >>> processor it is screaming through the flowfiles arriving at its >>>> doorstep. >>>> >>> >>>> >>> Can anyone comment on performance optimizations for >>>> HandleHttpRequest? In >>>> >>> your experiences, is HandleHttpRequest a bottleneck? I do notice >>>> that I >>>> >>> often have a count in the processor for "flowfile in process" >>>> within the >>>> >>> processor. Anywhere from 1 to 10 when it does show such a count. >>>> >>> >>>> >>> -Jim >>>> >>> >>>> >>> On Wed, Apr 5, 2017 at 1:52 PM, Matt Burgess <[email protected]> >>>> >>> wrote: >>>> >>>> >>>> >>>> Jim, >>>> >>>> >>>> >>>> One quick thing you can try is to use GenerateFlowFile to send to >>>> your >>>> >>>> ExecuteScript instead of HandleHttpRequest, you can configure it to >>>> >>>> send whatever body with whatever attributes (such that you would >>>> get >>>> >>>> from HandleHttpRequest) and send files at whatever rate the >>>> processor >>>> >>>> is scheduled. This might take ExecuteScript out of the bottleneck >>>> >>>> equation; if you are getting plenty of throughput without >>>> >>>> HandleHttpRequest then that's probably your bottleneck. >>>> >>>> >>>> >>>> I'm not sure offhand about optimizations for HandleHttpRequest, >>>> >>>> perhaps someone else will jump in :) >>>> >>>> >>>> >>>> Regards, >>>> >>>> Matt >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Apr 5, 2017 at 1:48 PM, James McMahon < >>>> [email protected]> >>>> >>>> wrote: >>>> >>>> > I am receiving POSTs from a Pentaho process, delivering files to >>>> my >>>> >>>> > NiFi >>>> >>>> > 0.7.x workflow HandleHttpRequest processor. That processor hands >>>> the >>>> >>>> > flowfile off to an ExecuteScript processor that runs a python >>>> script. >>>> >>>> > This >>>> >>>> > script is very, very simple: it takes an incoming JSO object and >>>> loads >>>> >>>> > it >>>> >>>> > into a Python dictionary, and verifies the presence of required >>>> fields >>>> >>>> > using >>>> >>>> > simple has_key checks on the dictionary. There are only eight >>>> fields >>>> >>>> > in the >>>> >>>> > incoming JSON object. >>>> >>>> > >>>> >>>> > The throughput for these two processes is not exceeding 100-150 >>>> files >>>> >>>> > in >>>> >>>> > five minutes. It seems very slow in light of the minimal >>>> processing >>>> >>>> > going on >>>> >>>> > in these two steps. >>>> >>>> > >>>> >>>> > I notice that there are configuration operations seemingly >>>> related to >>>> >>>> > optimizing performance. "Concurrent tasks", for example, is >>>> only set >>>> >>>> > by >>>> >>>> > default to 1 for each processor. >>>> >>>> > >>>> >>>> > What performance optimizations at the processor level do users >>>> >>>> > recommend? Is >>>> >>>> > it advisable to crank up the concurrent tasks for a processor, >>>> and is >>>> >>>> > there >>>> >>>> > an optimal performance point beyond which you should not crank >>>> up that >>>> >>>> > value? Are there trade-offs? >>>> >>>> > >>>> >>>> > I am particularly interested in optimizations for >>>> HandleHttpRequest >>>> >>>> > and >>>> >>>> > ExecuteScript processors. >>>> >>>> > >>>> >>>> > Thanks in advance for your thoughts. >>>> >>>> > >>>> >>>> > cheers, >>>> >>>> > >>>> >>>> > Jim >>>> >>> >>>> >>> >>>> >> >>>> > >>>> >>> >>> >> >
