Re: Options for increasing performance?

James McMahon Wed, 05 Apr 2017 12:45:58 -0700

Thank you Bryan. I will explore these things. I suspect we are not
receiving from the source optimally. Reason I say that is this: I am doing
manual refreshes on my flow page every 3 to 4 seconds. Frequently I go
through 3 or 4 refreshes, and not figures change in my queues nor in my
processors. Seems like my workflow is just sitting there waiting for new
arrivals.
I am using ports 8446 and 9448 (I have two HandleHttpRequest processors
now). Anyone know fo a few commands I can use to monitor arrivals at my
port of incoming POSTs? Is this something I can monitor using the FF
developer features? -Jim


On Wed, Apr 5, 2017 at 3:39 PM, Bryan Rosander <[email protected]> wrote:

> This seems to have gotten lost in the chain, resending (please disregard
> if you've already read/tried it):
>
> Another thing to consider is whether the bottleneck is in NiFi or before
> it gets there.  Is the source of data capable of making post requests more
> quickly than that as configured? Is network latency or throughput a
> limitation?  You might try  posting to another http server to see whether
> the problem is within NiFi.
>
> E.g. modify something like https://gist.github.com/bradmontgomery/2219997 to
> log requests and see if the rate is similar even when no other processing
> is done on the server side.
>
> If you go with the python server, you may want to use the threading mixin
> as well.
>
> http://stackoverflow.com/questions/14088294/multithreaded-
> web-server-in-python
>
> Thanks,
> Bryan
>
> On Wed, Apr 5, 2017 at 3:19 PM, James McMahon <[email protected]>
> wrote:
>
>> We are not seeing 503s. We have tried setting up a second
>> HandleHttpRequest, watching a different port, and "round robin`ing" to the
>> two ports. We made a relatively low gain from abut 5 minutes for 100 files
>> consistently to 4:40 for 100. I watch my workflow, and at no point does a
>> large number of flowfiles queue up in any queue leading into or coming out
>> of any processor.
>>
>> On Wed, Apr 5, 2017 at 2:44 PM, Bryan Rosander <[email protected]>
>> wrote:
>>
>>> It looks like HandleHttpRequest should be sending back a 503 if its
>>> containerQueue fills up (default capacity of 50 requests that have been
>>> accepted but not processed in an onTrigger()) [1].  Also, the default
>>> thread pool the jetty server is using should be able to create up to 200
>>> threads to accept connections and the handler is using an async context so
>>> the in-flight flow files shouldn't be holding up new requests.
>>>
>>> If you're not seeing 503s it might be on the sender side of the
>>> equation.  Is the sender doing posts concurrently or waiting on each to
>>> complete before sending another?
>>>
>>> [1] https://github.com/apache/nifi/blob/rel/nifi-0.7.0/nifi-
>>> nar-bundles/nifi-standard-bundle/nifi-standard-processors/sr
>>> c/main/java/org/apache/nifi/processors/standard/HandleHttpRe
>>> quest.java#L395
>>>
>>> On Wed, Apr 5, 2017 at 2:27 PM, Joe Witt <[email protected]> wrote:
>>>
>>>> Much of this goodness can be found in the help->Users Guide.
>>>> Adjusting run durection/scheduling factors:
>>>>   https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#
>>>> scheduling-tab
>>>>
>>>> These are the latest docs but I'm sure there is coverage in the older
>>>> stuff.
>>>>
>>>> Thanks
>>>>
>>>> On Wed, Apr 5, 2017 at 2:23 PM, James McMahon <[email protected]>
>>>> wrote:
>>>> > Yes sir! Sure am. And I know, because I have committed that very silly
>>>> > mistake before. We are indeed seeing # responses = # requests  -Jim
>>>> >
>>>> > On Wed, Apr 5, 2017 at 2:13 PM, Bryan Rosander <[email protected]>
>>>> wrote:
>>>> >>
>>>> >> Hey James,
>>>> >>
>>>> >> Are you making sure that every route from HandleHttpRequest goes to a
>>>> >> HandleHttpResponse?  If not, the StandardHttpContextMap may be
>>>> filling up
>>>> >> with requests which would probably delay processing.
>>>> >>
>>>> >> Thanks,
>>>> >> Bryan
>>>> >>
>>>> >> On Wed, Apr 5, 2017 at 2:07 PM, James McMahon <[email protected]>
>>>> >> wrote:
>>>> >>>
>>>> >>> Thank you very much Matt. I have cranked my Concurrent Tasks config
>>>> parm
>>>> >>> on my ExecuteScripts up to 20, and judging by the empty queue
>>>> feeding that
>>>> >>> processor it is screaming through the flowfiles arriving at its
>>>> doorstep.
>>>> >>>
>>>> >>> Can anyone comment on performance optimizations for
>>>> HandleHttpRequest? In
>>>> >>> your experiences, is HandleHttpRequest a bottleneck? I do notice
>>>> that I
>>>> >>> often have a count in the processor for "flowfile in process"
>>>> within the
>>>> >>> processor. Anywhere from 1 to 10 when it does show such a count.
>>>> >>>
>>>> >>> -Jim
>>>> >>>
>>>> >>> On Wed, Apr 5, 2017 at 1:52 PM, Matt Burgess <[email protected]>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> Jim,
>>>> >>>>
>>>> >>>> One quick thing you can try is to use GenerateFlowFile to send to
>>>> your
>>>> >>>> ExecuteScript instead of HandleHttpRequest, you can configure it to
>>>> >>>> send whatever body with whatever attributes (such that you would
>>>> get
>>>> >>>> from HandleHttpRequest) and send files at whatever rate the
>>>> processor
>>>> >>>> is scheduled. This might take ExecuteScript out of the bottleneck
>>>> >>>> equation; if you are getting plenty of throughput without
>>>> >>>> HandleHttpRequest then that's probably your bottleneck.
>>>> >>>>
>>>> >>>> I'm not sure offhand about optimizations for HandleHttpRequest,
>>>> >>>> perhaps someone else will jump in :)
>>>> >>>>
>>>> >>>> Regards,
>>>> >>>> Matt
>>>> >>>>
>>>> >>>>
>>>> >>>> On Wed, Apr 5, 2017 at 1:48 PM, James McMahon <
>>>> [email protected]>
>>>> >>>> wrote:
>>>> >>>> > I am receiving POSTs from a Pentaho process, delivering files to
>>>> my
>>>> >>>> > NiFi
>>>> >>>> > 0.7.x workflow HandleHttpRequest processor. That processor hands
>>>> the
>>>> >>>> > flowfile off to an ExecuteScript processor that runs a python
>>>> script.
>>>> >>>> > This
>>>> >>>> > script is very, very simple: it takes an incoming JSO object and
>>>> loads
>>>> >>>> > it
>>>> >>>> > into a Python dictionary, and verifies the presence of required
>>>> fields
>>>> >>>> > using
>>>> >>>> > simple has_key checks on the dictionary. There are only eight
>>>> fields
>>>> >>>> > in the
>>>> >>>> > incoming JSON object.
>>>> >>>> >
>>>> >>>> > The throughput for these two processes is not exceeding 100-150
>>>> files
>>>> >>>> > in
>>>> >>>> > five minutes. It seems very slow in light of the minimal
>>>> processing
>>>> >>>> > going on
>>>> >>>> > in these two steps.
>>>> >>>> >
>>>> >>>> > I notice that there are configuration operations seemingly
>>>> related to
>>>> >>>> > optimizing performance. "Concurrent tasks", for example,  is
>>>> only set
>>>> >>>> > by
>>>> >>>> > default to 1 for each processor.
>>>> >>>> >
>>>> >>>> > What performance optimizations at the processor level do users
>>>> >>>> > recommend? Is
>>>> >>>> > it advisable to crank up the concurrent tasks for a processor,
>>>> and is
>>>> >>>> > there
>>>> >>>> > an optimal performance point beyond which you should not crank
>>>> up that
>>>> >>>> > value? Are there trade-offs?
>>>> >>>> >
>>>> >>>> > I am particularly interested in optimizations for
>>>> HandleHttpRequest
>>>> >>>> > and
>>>> >>>> > ExecuteScript processors.
>>>> >>>> >
>>>> >>>> > Thanks in advance for your thoughts.
>>>> >>>> >
>>>> >>>> > cheers,
>>>> >>>> >
>>>> >>>> > Jim
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>

Re: Options for increasing performance?

Reply via email to