Thanks Mark - I was not expecting a Bug report out of this! I'll give the 0 millis a try tomorrow and see what happens. In fairness, your laptop is probably more powerful than the virtual CPUs I'm running on :-).
@Ryan I've got to learn the Record stuff better than I have now... It's the whole complicated schema thing that has kept me away for far too long... Ryan On Tue, Sep 15, 2020 at 7:04 PM Mark Payne <[email protected]> wrote: > Hey Ryan, > > I tried to replicate the behavior that you’re seeing. I wasn’t seeing > behavior as slow as what you’re mentioning, but was definitely seeing > significantly slower performance than I would have expected (reached about > 1.5 million/5 mins on my laptop, would expect about 8-10 million/5 mins). > Did some quick profiling and see that it’s due to the NiFi session not > handling a large number of Provenance Route events well. I created a Jira > for this [1]. Interestingly, in the interim, you may get better performance > by using a Run Duration of 0 millis instead of 1 second. That would end up > being more expensive in other ways but would avoid the issue found in > NIFI-7812. Hard to know for sure if it would help without trying it out to > see. > > Hope this helps! > -Mark > > https://issues.apache.org/jira/browse/NIFI-7812 > > > > On Sep 15, 2020, at 5:42 PM, Ryan Hendrickson < > [email protected]> wrote: > > Hi Mark, > I'm using Next Available, and the Destination Queues are set with Zero > (0) for Back Pressure and Size threshold, so the destinations should not > fill up. > > I did switch to using RoundRobin and set it to a yield of 0. That got > me up to about 300,000 ff's / 5 minutes. I was hoping for something around > 1,000,000 ff / 5 minutes. > > The overall flow looks a bit like this: Large amount of flow files -> > Distribute Load -> PutElasticsearcHttp. > > Ryan > > On Tue, Sep 15, 2020 at 4:55 PM Mark Payne <[email protected]> wrote: > >> Ryan, >> >> I presume you’re using the Round Robin strategy? Looks like that strategy >> will yield the processor if any destination is full. And it sounds like >> that will be very common in your case. Would recommend configuring the >> Processor and in the Settings tab, set the Yield Duration to “0 secs”. I >> suspect that will give you dramatically better performance. >> >> Thanks >> -Mark >> >> >> > On Sep 15, 2020, at 4:41 PM, Ryan Hendrickson < >> [email protected]> wrote: >> > >> > Hello, >> > I've got 1 million plus FlowFiles (nothing I can do about the >> count), that goto a DistributeLoad. The DistributeLoad with 2 threads, a >> run duration of 1 sec can only sustain ~200,000 FlowFiles / five minutes. >> > >> > Is there a better design pattern or a processor that takes a Batch >> Size to split a Relationship into two or more? >> > >> > Thanks, >> > Ryan >> >> >
