If you have wireshark you could use: tshark -f "port 8446 or port 9448"
On Wed, Apr 5, 2017 at 3:45 PM James McMahon <[email protected]> wrote: > Thank you Bryan. I will explore these things. I suspect we are not > receiving from the source optimally. Reason I say that is this: I am doing > manual refreshes on my flow page every 3 to 4 seconds. Frequently I go > through 3 or 4 refreshes, and not figures change in my queues nor in my > processors. Seems like my workflow is just sitting there waiting for new > arrivals. > I am using ports 8446 and 9448 (I have two HandleHttpRequest processors > now). Anyone know fo a few commands I can use to monitor arrivals at my > port of incoming POSTs? Is this something I can monitor using the FF > developer features? -Jim > > On Wed, Apr 5, 2017 at 3:39 PM, Bryan Rosander <[email protected]> > wrote: > > This seems to have gotten lost in the chain, resending (please disregard > if you've already read/tried it): > > Another thing to consider is whether the bottleneck is in NiFi or before > it gets there. Is the source of data capable of making post requests more > quickly than that as configured? Is network latency or throughput a > limitation? You might try posting to another http server to see whether > the problem is within NiFi. > > E.g. modify something like https://gist.github.com/bradmontgomery/2219997 to > log requests and see if the rate is similar even when no other processing > is done on the server side. > > If you go with the python server, you may want to use the threading mixin > as well. > > > http://stackoverflow.com/questions/14088294/multithreaded-web-server-in-python > > Thanks, > Bryan > > On Wed, Apr 5, 2017 at 3:19 PM, James McMahon <[email protected]> > wrote: > > We are not seeing 503s. We have tried setting up a second > HandleHttpRequest, watching a different port, and "round robin`ing" to the > two ports. We made a relatively low gain from abut 5 minutes for 100 files > consistently to 4:40 for 100. I watch my workflow, and at no point does a > large number of flowfiles queue up in any queue leading into or coming out > of any processor. > > On Wed, Apr 5, 2017 at 2:44 PM, Bryan Rosander <[email protected]> > wrote: > > It looks like HandleHttpRequest should be sending back a 503 if its > containerQueue fills up (default capacity of 50 requests that have been > accepted but not processed in an onTrigger()) [1]. Also, the default > thread pool the jetty server is using should be able to create up to 200 > threads to accept connections and the handler is using an async context so > the in-flight flow files shouldn't be holding up new requests. > > If you're not seeing 503s it might be on the sender side of the equation. > Is the sender doing posts concurrently or waiting on each to complete > before sending another? > > [1] > https://github.com/apache/nifi/blob/rel/nifi-0.7.0/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpRequest.java#L395 > > On Wed, Apr 5, 2017 at 2:27 PM, Joe Witt <[email protected]> wrote: > > Much of this goodness can be found in the help->Users Guide. > Adjusting run durection/scheduling factors: > > https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#scheduling-tab > > These are the latest docs but I'm sure there is coverage in the older > stuff. > > Thanks > > On Wed, Apr 5, 2017 at 2:23 PM, James McMahon <[email protected]> > wrote: > > Yes sir! Sure am. And I know, because I have committed that very silly > > mistake before. We are indeed seeing # responses = # requests -Jim > > > > On Wed, Apr 5, 2017 at 2:13 PM, Bryan Rosander <[email protected]> > wrote: > >> > >> Hey James, > >> > >> Are you making sure that every route from HandleHttpRequest goes to a > >> HandleHttpResponse? If not, the StandardHttpContextMap may be filling > up > >> with requests which would probably delay processing. > >> > >> Thanks, > >> Bryan > >> > >> On Wed, Apr 5, 2017 at 2:07 PM, James McMahon <[email protected]> > >> wrote: > >>> > >>> Thank you very much Matt. I have cranked my Concurrent Tasks config > parm > >>> on my ExecuteScripts up to 20, and judging by the empty queue feeding > that > >>> processor it is screaming through the flowfiles arriving at its > doorstep. > >>> > >>> Can anyone comment on performance optimizations for HandleHttpRequest? > In > >>> your experiences, is HandleHttpRequest a bottleneck? I do notice that I > >>> often have a count in the processor for "flowfile in process" within > the > >>> processor. Anywhere from 1 to 10 when it does show such a count. > >>> > >>> -Jim > >>> > >>> On Wed, Apr 5, 2017 at 1:52 PM, Matt Burgess <[email protected]> > >>> wrote: > >>>> > >>>> Jim, > >>>> > >>>> One quick thing you can try is to use GenerateFlowFile to send to your > >>>> ExecuteScript instead of HandleHttpRequest, you can configure it to > >>>> send whatever body with whatever attributes (such that you would get > >>>> from HandleHttpRequest) and send files at whatever rate the processor > >>>> is scheduled. This might take ExecuteScript out of the bottleneck > >>>> equation; if you are getting plenty of throughput without > >>>> HandleHttpRequest then that's probably your bottleneck. > >>>> > >>>> I'm not sure offhand about optimizations for HandleHttpRequest, > >>>> perhaps someone else will jump in :) > >>>> > >>>> Regards, > >>>> Matt > >>>> > >>>> > >>>> On Wed, Apr 5, 2017 at 1:48 PM, James McMahon <[email protected]> > >>>> wrote: > >>>> > I am receiving POSTs from a Pentaho process, delivering files to my > >>>> > NiFi > >>>> > 0.7.x workflow HandleHttpRequest processor. That processor hands the > >>>> > flowfile off to an ExecuteScript processor that runs a python > script. > >>>> > This > >>>> > script is very, very simple: it takes an incoming JSO object and > loads > >>>> > it > >>>> > into a Python dictionary, and verifies the presence of required > fields > >>>> > using > >>>> > simple has_key checks on the dictionary. There are only eight fields > >>>> > in the > >>>> > incoming JSON object. > >>>> > > >>>> > The throughput for these two processes is not exceeding 100-150 > files > >>>> > in > >>>> > five minutes. It seems very slow in light of the minimal processing > >>>> > going on > >>>> > in these two steps. > >>>> > > >>>> > I notice that there are configuration operations seemingly related > to > >>>> > optimizing performance. "Concurrent tasks", for example, is only > set > >>>> > by > >>>> > default to 1 for each processor. > >>>> > > >>>> > What performance optimizations at the processor level do users > >>>> > recommend? Is > >>>> > it advisable to crank up the concurrent tasks for a processor, and > is > >>>> > there > >>>> > an optimal performance point beyond which you should not crank up > that > >>>> > value? Are there trade-offs? > >>>> > > >>>> > I am particularly interested in optimizations for HandleHttpRequest > >>>> > and > >>>> > ExecuteScript processors. > >>>> > > >>>> > Thanks in advance for your thoughts. > >>>> > > >>>> > cheers, > >>>> > > >>>> > Jim > >>> > >>> > >> > > > > > > > >
