We are not seeing 503s. We have tried setting up a second HandleHttpRequest, watching a different port, and "round robin`ing" to the two ports. We made a relatively low gain from abut 5 minutes for 100 files consistently to 4:40 for 100. I watch my workflow, and at no point does a large number of flowfiles queue up in any queue leading into or coming out of any processor.
On Wed, Apr 5, 2017 at 2:44 PM, Bryan Rosander <[email protected]> wrote: > It looks like HandleHttpRequest should be sending back a 503 if its > containerQueue fills up (default capacity of 50 requests that have been > accepted but not processed in an onTrigger()) [1]. Also, the default > thread pool the jetty server is using should be able to create up to 200 > threads to accept connections and the handler is using an async context so > the in-flight flow files shouldn't be holding up new requests. > > If you're not seeing 503s it might be on the sender side of the equation. > Is the sender doing posts concurrently or waiting on each to complete > before sending another? > > [1] https://github.com/apache/nifi/blob/rel/nifi-0.7.0/nifi- > nar-bundles/nifi-standard-bundle/nifi-standard- > processors/src/main/java/org/apache/nifi/processors/ > standard/HandleHttpRequest.java#L395 > > On Wed, Apr 5, 2017 at 2:27 PM, Joe Witt <[email protected]> wrote: > >> Much of this goodness can be found in the help->Users Guide. >> Adjusting run durection/scheduling factors: >> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html# >> scheduling-tab >> >> These are the latest docs but I'm sure there is coverage in the older >> stuff. >> >> Thanks >> >> On Wed, Apr 5, 2017 at 2:23 PM, James McMahon <[email protected]> >> wrote: >> > Yes sir! Sure am. And I know, because I have committed that very silly >> > mistake before. We are indeed seeing # responses = # requests -Jim >> > >> > On Wed, Apr 5, 2017 at 2:13 PM, Bryan Rosander <[email protected]> >> wrote: >> >> >> >> Hey James, >> >> >> >> Are you making sure that every route from HandleHttpRequest goes to a >> >> HandleHttpResponse? If not, the StandardHttpContextMap may be filling >> up >> >> with requests which would probably delay processing. >> >> >> >> Thanks, >> >> Bryan >> >> >> >> On Wed, Apr 5, 2017 at 2:07 PM, James McMahon <[email protected]> >> >> wrote: >> >>> >> >>> Thank you very much Matt. I have cranked my Concurrent Tasks config >> parm >> >>> on my ExecuteScripts up to 20, and judging by the empty queue feeding >> that >> >>> processor it is screaming through the flowfiles arriving at its >> doorstep. >> >>> >> >>> Can anyone comment on performance optimizations for >> HandleHttpRequest? In >> >>> your experiences, is HandleHttpRequest a bottleneck? I do notice that >> I >> >>> often have a count in the processor for "flowfile in process" within >> the >> >>> processor. Anywhere from 1 to 10 when it does show such a count. >> >>> >> >>> -Jim >> >>> >> >>> On Wed, Apr 5, 2017 at 1:52 PM, Matt Burgess <[email protected]> >> >>> wrote: >> >>>> >> >>>> Jim, >> >>>> >> >>>> One quick thing you can try is to use GenerateFlowFile to send to >> your >> >>>> ExecuteScript instead of HandleHttpRequest, you can configure it to >> >>>> send whatever body with whatever attributes (such that you would get >> >>>> from HandleHttpRequest) and send files at whatever rate the processor >> >>>> is scheduled. This might take ExecuteScript out of the bottleneck >> >>>> equation; if you are getting plenty of throughput without >> >>>> HandleHttpRequest then that's probably your bottleneck. >> >>>> >> >>>> I'm not sure offhand about optimizations for HandleHttpRequest, >> >>>> perhaps someone else will jump in :) >> >>>> >> >>>> Regards, >> >>>> Matt >> >>>> >> >>>> >> >>>> On Wed, Apr 5, 2017 at 1:48 PM, James McMahon <[email protected]> >> >>>> wrote: >> >>>> > I am receiving POSTs from a Pentaho process, delivering files to my >> >>>> > NiFi >> >>>> > 0.7.x workflow HandleHttpRequest processor. That processor hands >> the >> >>>> > flowfile off to an ExecuteScript processor that runs a python >> script. >> >>>> > This >> >>>> > script is very, very simple: it takes an incoming JSO object and >> loads >> >>>> > it >> >>>> > into a Python dictionary, and verifies the presence of required >> fields >> >>>> > using >> >>>> > simple has_key checks on the dictionary. There are only eight >> fields >> >>>> > in the >> >>>> > incoming JSON object. >> >>>> > >> >>>> > The throughput for these two processes is not exceeding 100-150 >> files >> >>>> > in >> >>>> > five minutes. It seems very slow in light of the minimal processing >> >>>> > going on >> >>>> > in these two steps. >> >>>> > >> >>>> > I notice that there are configuration operations seemingly related >> to >> >>>> > optimizing performance. "Concurrent tasks", for example, is only >> set >> >>>> > by >> >>>> > default to 1 for each processor. >> >>>> > >> >>>> > What performance optimizations at the processor level do users >> >>>> > recommend? Is >> >>>> > it advisable to crank up the concurrent tasks for a processor, and >> is >> >>>> > there >> >>>> > an optimal performance point beyond which you should not crank up >> that >> >>>> > value? Are there trade-offs? >> >>>> > >> >>>> > I am particularly interested in optimizations for HandleHttpRequest >> >>>> > and >> >>>> > ExecuteScript processors. >> >>>> > >> >>>> > Thanks in advance for your thoughts. >> >>>> > >> >>>> > cheers, >> >>>> > >> >>>> > Jim >> >>> >> >>> >> >> >> > >> > >
