Re: Performance of DistributeLoad - Batch Size

Mark Payne Tue, 15 Sep 2020 16:05:11 -0700

Hey Ryan,

I tried to replicate the behavior that you’re seeing. I wasn’t seeing behavior 
as slow as what you’re mentioning, but was definitely seeing significantly 
slower performance than I would have expected (reached about 1.5 million/5 mins 
on my laptop, would expect about 8-10 million/5 mins). Did some quick profiling 
and see that it’s due to the NiFi session not handling a large number of 
Provenance Route events well. I created a Jira for this [1]. Interestingly, in 
the interim, you may get better performance by using a Run Duration of 0 millis 
instead of 1 second. That would end up being more expensive in other ways but 
would avoid the issue found in NIFI-7812. Hard to know for sure if it would 
help without trying it out to see.

Hope this helps!
-Mark

https://issues.apache.org/jira/browse/NIFI-7812

On Sep 15, 2020, at 5:42 PM, Ryan Hendrickson 
<ryan.andrew.hendrick...@gmail.com<mailto:ryan.andrew.hendrick...@gmail.com>> 
wrote:

Hi Mark,
   I'm using Next Available, and the Destination Queues are set with Zero (0) 
for Back Pressure and Size threshold, so the destinations should not fill up.

   I did switch to using RoundRobin and set it to a yield of 0.  That got me up 
to about 300,000 ff's / 5 minutes.  I was hoping for something around 1,000,000 
ff / 5 minutes.

   The overall flow looks a bit like this: Large amount of flow files -> 
Distribute Load -> PutElasticsearcHttp.

Ryan

On Tue, Sep 15, 2020 at 4:55 PM Mark Payne 
<marka...@hotmail.com<mailto:marka...@hotmail.com>> wrote:
Ryan,

I presume you’re using the Round Robin strategy? Looks like that strategy will 
yield the processor if any destination is full. And it sounds like that will be 
very common in your case. Would recommend configuring the Processor and in the 
Settings tab, set the Yield Duration to “0 secs”. I suspect that will give you 
dramatically better performance.

Thanks
-Mark

> On Sep 15, 2020, at 4:41 PM, Ryan Hendrickson 
> <ryan.andrew.hendrick...@gmail.com<mailto:ryan.andrew.hendrick...@gmail.com>> 
> wrote:
>
> Hello,
>    I've got 1 million plus FlowFiles (nothing I can do about the count), that 
> goto a DistributeLoad.  The DistributeLoad with 2 threads, a run duration of 
> 1 sec can only sustain ~200,000 FlowFiles / five minutes.
>
>    Is there a better design pattern or a processor that takes a Batch Size to 
> split a Relationship into two or more?
>
> Thanks,
> Ryan

Re: Performance of DistributeLoad - Batch Size

Reply via email to