Re: Performance of DistributeLoad - Batch Size

Ryan Hendrickson Tue, 15 Sep 2020 18:48:48 -0700

Thanks Mark - I was not expecting a Bug report out of this!  I'll give the
0 millis a try tomorrow and see what happens.  In fairness, your laptop is
probably more powerful than the virtual CPUs I'm running on :-).


@Ryan I've got to learn the Record stuff better than I have now... It's the
whole complicated schema thing that has kept me away for far too long...

Ryan

On Tue, Sep 15, 2020 at 7:04 PM Mark Payne <[email protected]> wrote:

> Hey Ryan,
>
> I tried to replicate the behavior that you’re seeing. I wasn’t seeing
> behavior as slow as what you’re mentioning, but was definitely seeing
> significantly slower performance than I would have expected (reached about
> 1.5 million/5 mins on my laptop, would expect about 8-10 million/5 mins).
> Did some quick profiling and see that it’s due to the NiFi session not
> handling a large number of Provenance Route events well. I created a Jira
> for this [1]. Interestingly, in the interim, you may get better performance
> by using a Run Duration of 0 millis instead of 1 second. That would end up
> being more expensive in other ways but would avoid the issue found in
> NIFI-7812. Hard to know for sure if it would help without trying it out to
> see.
>
> Hope this helps!
> -Mark
>
> https://issues.apache.org/jira/browse/NIFI-7812
>
>
>
> On Sep 15, 2020, at 5:42 PM, Ryan Hendrickson <
> [email protected]> wrote:
>
> Hi Mark,
>    I'm using Next Available, and the Destination Queues are set with Zero
> (0) for Back Pressure and Size threshold, so the destinations should not
> fill up.
>
>    I did switch to using RoundRobin and set it to a yield of 0.  That got
> me up to about 300,000 ff's / 5 minutes.  I was hoping for something around
> 1,000,000 ff / 5 minutes.
>
>    The overall flow looks a bit like this: Large amount of flow files ->
> Distribute Load -> PutElasticsearcHttp.
>
> Ryan
>
> On Tue, Sep 15, 2020 at 4:55 PM Mark Payne <[email protected]> wrote:
>
>> Ryan,
>>
>> I presume you’re using the Round Robin strategy? Looks like that strategy
>> will yield the processor if any destination is full. And it sounds like
>> that will be very common in your case. Would recommend configuring the
>> Processor and in the Settings tab, set the Yield Duration to “0 secs”. I
>> suspect that will give you dramatically better performance.
>>
>> Thanks
>> -Mark
>>
>>
>> > On Sep 15, 2020, at 4:41 PM, Ryan Hendrickson <
>> [email protected]> wrote:
>> >
>> > Hello,
>> >    I've got 1 million plus FlowFiles (nothing I can do about the
>> count), that goto a DistributeLoad.  The DistributeLoad with 2 threads, a
>> run duration of 1 sec can only sustain ~200,000 FlowFiles / five minutes.
>> >
>> >    Is there a better design pattern or a processor that takes a Batch
>> Size to split a Relationship into two or more?
>> >
>> > Thanks,
>> > Ryan
>>
>>
>

Re: Performance of DistributeLoad - Batch Size

Reply via email to