Re: Options for increasing performance?

James McMahon Wed, 05 Apr 2017 12:20:12 -0700

We are not seeing 503s. We have tried setting up a second
HandleHttpRequest, watching a different port, and "round robin`ing" to the
two ports. We made a relatively low gain from abut 5 minutes for 100 files
consistently to 4:40 for 100. I watch my workflow, and at no point does a
large number of flowfiles queue up in any queue leading into or coming out
of any processor.


On Wed, Apr 5, 2017 at 2:44 PM, Bryan Rosander <[email protected]> wrote:

> It looks like HandleHttpRequest should be sending back a 503 if its
> containerQueue fills up (default capacity of 50 requests that have been
> accepted but not processed in an onTrigger()) [1].  Also, the default
> thread pool the jetty server is using should be able to create up to 200
> threads to accept connections and the handler is using an async context so
> the in-flight flow files shouldn't be holding up new requests.
>
> If you're not seeing 503s it might be on the sender side of the equation.
> Is the sender doing posts concurrently or waiting on each to complete
> before sending another?
>
> [1] https://github.com/apache/nifi/blob/rel/nifi-0.7.0/nifi-
> nar-bundles/nifi-standard-bundle/nifi-standard-
> processors/src/main/java/org/apache/nifi/processors/
> standard/HandleHttpRequest.java#L395
>
> On Wed, Apr 5, 2017 at 2:27 PM, Joe Witt <[email protected]> wrote:
>
>> Much of this goodness can be found in the help->Users Guide.
>> Adjusting run durection/scheduling factors:
>>   https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#
>> scheduling-tab
>>
>> These are the latest docs but I'm sure there is coverage in the older
>> stuff.
>>
>> Thanks
>>
>> On Wed, Apr 5, 2017 at 2:23 PM, James McMahon <[email protected]>
>> wrote:
>> > Yes sir! Sure am. And I know, because I have committed that very silly
>> > mistake before. We are indeed seeing # responses = # requests  -Jim
>> >
>> > On Wed, Apr 5, 2017 at 2:13 PM, Bryan Rosander <[email protected]>
>> wrote:
>> >>
>> >> Hey James,
>> >>
>> >> Are you making sure that every route from HandleHttpRequest goes to a
>> >> HandleHttpResponse?  If not, the StandardHttpContextMap may be filling
>> up
>> >> with requests which would probably delay processing.
>> >>
>> >> Thanks,
>> >> Bryan
>> >>
>> >> On Wed, Apr 5, 2017 at 2:07 PM, James McMahon <[email protected]>
>> >> wrote:
>> >>>
>> >>> Thank you very much Matt. I have cranked my Concurrent Tasks config
>> parm
>> >>> on my ExecuteScripts up to 20, and judging by the empty queue feeding
>> that
>> >>> processor it is screaming through the flowfiles arriving at its
>> doorstep.
>> >>>
>> >>> Can anyone comment on performance optimizations for
>> HandleHttpRequest? In
>> >>> your experiences, is HandleHttpRequest a bottleneck? I do notice that
>> I
>> >>> often have a count in the processor for "flowfile in process" within
>> the
>> >>> processor. Anywhere from 1 to 10 when it does show such a count.
>> >>>
>> >>> -Jim
>> >>>
>> >>> On Wed, Apr 5, 2017 at 1:52 PM, Matt Burgess <[email protected]>
>> >>> wrote:
>> >>>>
>> >>>> Jim,
>> >>>>
>> >>>> One quick thing you can try is to use GenerateFlowFile to send to
>> your
>> >>>> ExecuteScript instead of HandleHttpRequest, you can configure it to
>> >>>> send whatever body with whatever attributes (such that you would get
>> >>>> from HandleHttpRequest) and send files at whatever rate the processor
>> >>>> is scheduled. This might take ExecuteScript out of the bottleneck
>> >>>> equation; if you are getting plenty of throughput without
>> >>>> HandleHttpRequest then that's probably your bottleneck.
>> >>>>
>> >>>> I'm not sure offhand about optimizations for HandleHttpRequest,
>> >>>> perhaps someone else will jump in :)
>> >>>>
>> >>>> Regards,
>> >>>> Matt
>> >>>>
>> >>>>
>> >>>> On Wed, Apr 5, 2017 at 1:48 PM, James McMahon <[email protected]>
>> >>>> wrote:
>> >>>> > I am receiving POSTs from a Pentaho process, delivering files to my
>> >>>> > NiFi
>> >>>> > 0.7.x workflow HandleHttpRequest processor. That processor hands
>> the
>> >>>> > flowfile off to an ExecuteScript processor that runs a python
>> script.
>> >>>> > This
>> >>>> > script is very, very simple: it takes an incoming JSO object and
>> loads
>> >>>> > it
>> >>>> > into a Python dictionary, and verifies the presence of required
>> fields
>> >>>> > using
>> >>>> > simple has_key checks on the dictionary. There are only eight
>> fields
>> >>>> > in the
>> >>>> > incoming JSON object.
>> >>>> >
>> >>>> > The throughput for these two processes is not exceeding 100-150
>> files
>> >>>> > in
>> >>>> > five minutes. It seems very slow in light of the minimal processing
>> >>>> > going on
>> >>>> > in these two steps.
>> >>>> >
>> >>>> > I notice that there are configuration operations seemingly related
>> to
>> >>>> > optimizing performance. "Concurrent tasks", for example,  is only
>> set
>> >>>> > by
>> >>>> > default to 1 for each processor.
>> >>>> >
>> >>>> > What performance optimizations at the processor level do users
>> >>>> > recommend? Is
>> >>>> > it advisable to crank up the concurrent tasks for a processor, and
>> is
>> >>>> > there
>> >>>> > an optimal performance point beyond which you should not crank up
>> that
>> >>>> > value? Are there trade-offs?
>> >>>> >
>> >>>> > I am particularly interested in optimizations for HandleHttpRequest
>> >>>> > and
>> >>>> > ExecuteScript processors.
>> >>>> >
>> >>>> > Thanks in advance for your thoughts.
>> >>>> >
>> >>>> > cheers,
>> >>>> >
>> >>>> > Jim
>> >>>
>> >>>
>> >>
>> >
>>
>
>

Re: Options for increasing performance?

Reply via email to