Re: Options for increasing performance?

James McMahon Thu, 06 Apr 2017 02:45:25 -0700

Thanks very much Juan. I do not find Wireshark in the apps available to me,
but will ask our infrastructure folks about that this morning. -Jim


On Wed, Apr 5, 2017 at 4:01 PM, Juan Sequeiros <[email protected]> wrote:

> If you have wireshark you could use:
>
>  tshark -f "port 8446 or port 9448"
>
>
> On Wed, Apr 5, 2017 at 3:45 PM James McMahon <[email protected]> wrote:
>
>> Thank you Bryan. I will explore these things. I suspect we are not
>> receiving from the source optimally. Reason I say that is this: I am doing
>> manual refreshes on my flow page every 3 to 4 seconds. Frequently I go
>> through 3 or 4 refreshes, and not figures change in my queues nor in my
>> processors. Seems like my workflow is just sitting there waiting for new
>> arrivals.
>> I am using ports 8446 and 9448 (I have two HandleHttpRequest processors
>> now). Anyone know fo a few commands I can use to monitor arrivals at my
>> port of incoming POSTs? Is this something I can monitor using the FF
>> developer features? -Jim
>>
>> On Wed, Apr 5, 2017 at 3:39 PM, Bryan Rosander <[email protected]>
>> wrote:
>>
>> This seems to have gotten lost in the chain, resending (please disregard
>> if you've already read/tried it):
>>
>> Another thing to consider is whether the bottleneck is in NiFi or before
>> it gets there.  Is the source of data capable of making post requests more
>> quickly than that as configured? Is network latency or throughput a
>> limitation?  You might try  posting to another http server to see whether
>> the problem is within NiFi.
>>
>> E.g. modify something like https://gist.github.com/bradmontgomery/2219997 to
>> log requests and see if the rate is similar even when no other processing
>> is done on the server side.
>>
>> If you go with the python server, you may want to use the threading mixin
>> as well.
>>
>> http://stackoverflow.com/questions/14088294/multithreaded-web-server-in-
>> python
>>
>> Thanks,
>> Bryan
>>
>> On Wed, Apr 5, 2017 at 3:19 PM, James McMahon <[email protected]>
>> wrote:
>>
>> We are not seeing 503s. We have tried setting up a second
>> HandleHttpRequest, watching a different port, and "round robin`ing" to the
>> two ports. We made a relatively low gain from abut 5 minutes for 100 files
>> consistently to 4:40 for 100. I watch my workflow, and at no point does a
>> large number of flowfiles queue up in any queue leading into or coming out
>> of any processor.
>>
>> On Wed, Apr 5, 2017 at 2:44 PM, Bryan Rosander <[email protected]>
>> wrote:
>>
>> It looks like HandleHttpRequest should be sending back a 503 if its
>> containerQueue fills up (default capacity of 50 requests that have been
>> accepted but not processed in an onTrigger()) [1].  Also, the default
>> thread pool the jetty server is using should be able to create up to 200
>> threads to accept connections and the handler is using an async context so
>> the in-flight flow files shouldn't be holding up new requests.
>>
>> If you're not seeing 503s it might be on the sender side of the
>> equation.  Is the sender doing posts concurrently or waiting on each to
>> complete before sending another?
>>
>> [1] https://github.com/apache/nifi/blob/rel/nifi-0.7.0/nifi-
>> nar-bundles/nifi-standard-bundle/nifi-standard-
>> processors/src/main/java/org/apache/nifi/processors/
>> standard/HandleHttpRequest.java#L395
>>
>> On Wed, Apr 5, 2017 at 2:27 PM, Joe Witt <[email protected]> wrote:
>>
>> Much of this goodness can be found in the help->Users Guide.
>> Adjusting run durection/scheduling factors:
>>   https://nifi.apache.org/docs/nifi-docs/html/user-guide.
>> html#scheduling-tab
>>
>> These are the latest docs but I'm sure there is coverage in the older
>> stuff.
>>
>> Thanks
>>
>> On Wed, Apr 5, 2017 at 2:23 PM, James McMahon <[email protected]>
>> wrote:
>> > Yes sir! Sure am. And I know, because I have committed that very silly
>> > mistake before. We are indeed seeing # responses = # requests  -Jim
>> >
>> > On Wed, Apr 5, 2017 at 2:13 PM, Bryan Rosander <[email protected]>
>> wrote:
>> >>
>> >> Hey James,
>> >>
>> >> Are you making sure that every route from HandleHttpRequest goes to a
>> >> HandleHttpResponse?  If not, the StandardHttpContextMap may be filling
>> up
>> >> with requests which would probably delay processing.
>> >>
>> >> Thanks,
>> >> Bryan
>> >>
>> >> On Wed, Apr 5, 2017 at 2:07 PM, James McMahon <[email protected]>
>> >> wrote:
>> >>>
>> >>> Thank you very much Matt. I have cranked my Concurrent Tasks config
>> parm
>> >>> on my ExecuteScripts up to 20, and judging by the empty queue feeding
>> that
>> >>> processor it is screaming through the flowfiles arriving at its
>> doorstep.
>> >>>
>> >>> Can anyone comment on performance optimizations for
>> HandleHttpRequest? In
>> >>> your experiences, is HandleHttpRequest a bottleneck? I do notice that
>> I
>> >>> often have a count in the processor for "flowfile in process" within
>> the
>> >>> processor. Anywhere from 1 to 10 when it does show such a count.
>> >>>
>> >>> -Jim
>> >>>
>> >>> On Wed, Apr 5, 2017 at 1:52 PM, Matt Burgess <[email protected]>
>> >>> wrote:
>> >>>>
>> >>>> Jim,
>> >>>>
>> >>>> One quick thing you can try is to use GenerateFlowFile to send to
>> your
>> >>>> ExecuteScript instead of HandleHttpRequest, you can configure it to
>> >>>> send whatever body with whatever attributes (such that you would get
>> >>>> from HandleHttpRequest) and send files at whatever rate the processor
>> >>>> is scheduled. This might take ExecuteScript out of the bottleneck
>> >>>> equation; if you are getting plenty of throughput without
>> >>>> HandleHttpRequest then that's probably your bottleneck.
>> >>>>
>> >>>> I'm not sure offhand about optimizations for HandleHttpRequest,
>> >>>> perhaps someone else will jump in :)
>> >>>>
>> >>>> Regards,
>> >>>> Matt
>> >>>>
>> >>>>
>> >>>> On Wed, Apr 5, 2017 at 1:48 PM, James McMahon <[email protected]>
>> >>>> wrote:
>> >>>> > I am receiving POSTs from a Pentaho process, delivering files to my
>> >>>> > NiFi
>> >>>> > 0.7.x workflow HandleHttpRequest processor. That processor hands
>> the
>> >>>> > flowfile off to an ExecuteScript processor that runs a python
>> script.
>> >>>> > This
>> >>>> > script is very, very simple: it takes an incoming JSO object and
>> loads
>> >>>> > it
>> >>>> > into a Python dictionary, and verifies the presence of required
>> fields
>> >>>> > using
>> >>>> > simple has_key checks on the dictionary. There are only eight
>> fields
>> >>>> > in the
>> >>>> > incoming JSON object.
>> >>>> >
>> >>>> > The throughput for these two processes is not exceeding 100-150
>> files
>> >>>> > in
>> >>>> > five minutes. It seems very slow in light of the minimal processing
>> >>>> > going on
>> >>>> > in these two steps.
>> >>>> >
>> >>>> > I notice that there are configuration operations seemingly related
>> to
>> >>>> > optimizing performance. "Concurrent tasks", for example,  is only
>> set
>> >>>> > by
>> >>>> > default to 1 for each processor.
>> >>>> >
>> >>>> > What performance optimizations at the processor level do users
>> >>>> > recommend? Is
>> >>>> > it advisable to crank up the concurrent tasks for a processor, and
>> is
>> >>>> > there
>> >>>> > an optimal performance point beyond which you should not crank up
>> that
>> >>>> > value? Are there trade-offs?
>> >>>> >
>> >>>> > I am particularly interested in optimizations for HandleHttpRequest
>> >>>> > and
>> >>>> > ExecuteScript processors.
>> >>>> >
>> >>>> > Thanks in advance for your thoughts.
>> >>>> >
>> >>>> > cheers,
>> >>>> >
>> >>>> > Jim
>> >>>
>> >>>
>> >>
>> >
>>
>>
>>
>>
>>
>>

Re: Options for increasing performance?

Reply via email to