Hey Ryan, I tried to replicate the behavior that you’re seeing. I wasn’t seeing behavior as slow as what you’re mentioning, but was definitely seeing significantly slower performance than I would have expected (reached about 1.5 million/5 mins on my laptop, would expect about 8-10 million/5 mins). Did some quick profiling and see that it’s due to the NiFi session not handling a large number of Provenance Route events well. I created a Jira for this [1]. Interestingly, in the interim, you may get better performance by using a Run Duration of 0 millis instead of 1 second. That would end up being more expensive in other ways but would avoid the issue found in NIFI-7812. Hard to know for sure if it would help without trying it out to see.
Hope this helps! -Mark https://issues.apache.org/jira/browse/NIFI-7812 On Sep 15, 2020, at 5:42 PM, Ryan Hendrickson <ryan.andrew.hendrick...@gmail.com<mailto:ryan.andrew.hendrick...@gmail.com>> wrote: Hi Mark, I'm using Next Available, and the Destination Queues are set with Zero (0) for Back Pressure and Size threshold, so the destinations should not fill up. I did switch to using RoundRobin and set it to a yield of 0. That got me up to about 300,000 ff's / 5 minutes. I was hoping for something around 1,000,000 ff / 5 minutes. The overall flow looks a bit like this: Large amount of flow files -> Distribute Load -> PutElasticsearcHttp. Ryan On Tue, Sep 15, 2020 at 4:55 PM Mark Payne <marka...@hotmail.com<mailto:marka...@hotmail.com>> wrote: Ryan, I presume you’re using the Round Robin strategy? Looks like that strategy will yield the processor if any destination is full. And it sounds like that will be very common in your case. Would recommend configuring the Processor and in the Settings tab, set the Yield Duration to “0 secs”. I suspect that will give you dramatically better performance. Thanks -Mark > On Sep 15, 2020, at 4:41 PM, Ryan Hendrickson > <ryan.andrew.hendrick...@gmail.com<mailto:ryan.andrew.hendrick...@gmail.com>> > wrote: > > Hello, > I've got 1 million plus FlowFiles (nothing I can do about the count), that > goto a DistributeLoad. The DistributeLoad with 2 threads, a run duration of > 1 sec can only sustain ~200,000 FlowFiles / five minutes. > > Is there a better design pattern or a processor that takes a Batch Size to > split a Relationship into two or more? > > Thanks, > Ryan