Thanks very much Juan. I do not find Wireshark in the apps available to me, but will ask our infrastructure folks about that this morning. -Jim
On Wed, Apr 5, 2017 at 4:01 PM, Juan Sequeiros <[email protected]> wrote: > If you have wireshark you could use: > > tshark -f "port 8446 or port 9448" > > > On Wed, Apr 5, 2017 at 3:45 PM James McMahon <[email protected]> wrote: > >> Thank you Bryan. I will explore these things. I suspect we are not >> receiving from the source optimally. Reason I say that is this: I am doing >> manual refreshes on my flow page every 3 to 4 seconds. Frequently I go >> through 3 or 4 refreshes, and not figures change in my queues nor in my >> processors. Seems like my workflow is just sitting there waiting for new >> arrivals. >> I am using ports 8446 and 9448 (I have two HandleHttpRequest processors >> now). Anyone know fo a few commands I can use to monitor arrivals at my >> port of incoming POSTs? Is this something I can monitor using the FF >> developer features? -Jim >> >> On Wed, Apr 5, 2017 at 3:39 PM, Bryan Rosander <[email protected]> >> wrote: >> >> This seems to have gotten lost in the chain, resending (please disregard >> if you've already read/tried it): >> >> Another thing to consider is whether the bottleneck is in NiFi or before >> it gets there. Is the source of data capable of making post requests more >> quickly than that as configured? Is network latency or throughput a >> limitation? You might try posting to another http server to see whether >> the problem is within NiFi. >> >> E.g. modify something like https://gist.github.com/bradmontgomery/2219997 to >> log requests and see if the rate is similar even when no other processing >> is done on the server side. >> >> If you go with the python server, you may want to use the threading mixin >> as well. >> >> http://stackoverflow.com/questions/14088294/multithreaded-web-server-in- >> python >> >> Thanks, >> Bryan >> >> On Wed, Apr 5, 2017 at 3:19 PM, James McMahon <[email protected]> >> wrote: >> >> We are not seeing 503s. We have tried setting up a second >> HandleHttpRequest, watching a different port, and "round robin`ing" to the >> two ports. We made a relatively low gain from abut 5 minutes for 100 files >> consistently to 4:40 for 100. I watch my workflow, and at no point does a >> large number of flowfiles queue up in any queue leading into or coming out >> of any processor. >> >> On Wed, Apr 5, 2017 at 2:44 PM, Bryan Rosander <[email protected]> >> wrote: >> >> It looks like HandleHttpRequest should be sending back a 503 if its >> containerQueue fills up (default capacity of 50 requests that have been >> accepted but not processed in an onTrigger()) [1]. Also, the default >> thread pool the jetty server is using should be able to create up to 200 >> threads to accept connections and the handler is using an async context so >> the in-flight flow files shouldn't be holding up new requests. >> >> If you're not seeing 503s it might be on the sender side of the >> equation. Is the sender doing posts concurrently or waiting on each to >> complete before sending another? >> >> [1] https://github.com/apache/nifi/blob/rel/nifi-0.7.0/nifi- >> nar-bundles/nifi-standard-bundle/nifi-standard- >> processors/src/main/java/org/apache/nifi/processors/ >> standard/HandleHttpRequest.java#L395 >> >> On Wed, Apr 5, 2017 at 2:27 PM, Joe Witt <[email protected]> wrote: >> >> Much of this goodness can be found in the help->Users Guide. >> Adjusting run durection/scheduling factors: >> https://nifi.apache.org/docs/nifi-docs/html/user-guide. >> html#scheduling-tab >> >> These are the latest docs but I'm sure there is coverage in the older >> stuff. >> >> Thanks >> >> On Wed, Apr 5, 2017 at 2:23 PM, James McMahon <[email protected]> >> wrote: >> > Yes sir! Sure am. And I know, because I have committed that very silly >> > mistake before. We are indeed seeing # responses = # requests -Jim >> > >> > On Wed, Apr 5, 2017 at 2:13 PM, Bryan Rosander <[email protected]> >> wrote: >> >> >> >> Hey James, >> >> >> >> Are you making sure that every route from HandleHttpRequest goes to a >> >> HandleHttpResponse? If not, the StandardHttpContextMap may be filling >> up >> >> with requests which would probably delay processing. >> >> >> >> Thanks, >> >> Bryan >> >> >> >> On Wed, Apr 5, 2017 at 2:07 PM, James McMahon <[email protected]> >> >> wrote: >> >>> >> >>> Thank you very much Matt. I have cranked my Concurrent Tasks config >> parm >> >>> on my ExecuteScripts up to 20, and judging by the empty queue feeding >> that >> >>> processor it is screaming through the flowfiles arriving at its >> doorstep. >> >>> >> >>> Can anyone comment on performance optimizations for >> HandleHttpRequest? In >> >>> your experiences, is HandleHttpRequest a bottleneck? I do notice that >> I >> >>> often have a count in the processor for "flowfile in process" within >> the >> >>> processor. Anywhere from 1 to 10 when it does show such a count. >> >>> >> >>> -Jim >> >>> >> >>> On Wed, Apr 5, 2017 at 1:52 PM, Matt Burgess <[email protected]> >> >>> wrote: >> >>>> >> >>>> Jim, >> >>>> >> >>>> One quick thing you can try is to use GenerateFlowFile to send to >> your >> >>>> ExecuteScript instead of HandleHttpRequest, you can configure it to >> >>>> send whatever body with whatever attributes (such that you would get >> >>>> from HandleHttpRequest) and send files at whatever rate the processor >> >>>> is scheduled. This might take ExecuteScript out of the bottleneck >> >>>> equation; if you are getting plenty of throughput without >> >>>> HandleHttpRequest then that's probably your bottleneck. >> >>>> >> >>>> I'm not sure offhand about optimizations for HandleHttpRequest, >> >>>> perhaps someone else will jump in :) >> >>>> >> >>>> Regards, >> >>>> Matt >> >>>> >> >>>> >> >>>> On Wed, Apr 5, 2017 at 1:48 PM, James McMahon <[email protected]> >> >>>> wrote: >> >>>> > I am receiving POSTs from a Pentaho process, delivering files to my >> >>>> > NiFi >> >>>> > 0.7.x workflow HandleHttpRequest processor. That processor hands >> the >> >>>> > flowfile off to an ExecuteScript processor that runs a python >> script. >> >>>> > This >> >>>> > script is very, very simple: it takes an incoming JSO object and >> loads >> >>>> > it >> >>>> > into a Python dictionary, and verifies the presence of required >> fields >> >>>> > using >> >>>> > simple has_key checks on the dictionary. There are only eight >> fields >> >>>> > in the >> >>>> > incoming JSON object. >> >>>> > >> >>>> > The throughput for these two processes is not exceeding 100-150 >> files >> >>>> > in >> >>>> > five minutes. It seems very slow in light of the minimal processing >> >>>> > going on >> >>>> > in these two steps. >> >>>> > >> >>>> > I notice that there are configuration operations seemingly related >> to >> >>>> > optimizing performance. "Concurrent tasks", for example, is only >> set >> >>>> > by >> >>>> > default to 1 for each processor. >> >>>> > >> >>>> > What performance optimizations at the processor level do users >> >>>> > recommend? Is >> >>>> > it advisable to crank up the concurrent tasks for a processor, and >> is >> >>>> > there >> >>>> > an optimal performance point beyond which you should not crank up >> that >> >>>> > value? Are there trade-offs? >> >>>> > >> >>>> > I am particularly interested in optimizations for HandleHttpRequest >> >>>> > and >> >>>> > ExecuteScript processors. >> >>>> > >> >>>> > Thanks in advance for your thoughts. >> >>>> > >> >>>> > cheers, >> >>>> > >> >>>> > Jim >> >>> >> >>> >> >> >> > >> >> >> >> >> >>
