Re: Best way to tune NiFi for huge amounts of small flowfiles

2020-09-15 Thread Ryan Hendrickson
We keep our queue limit at 20,000 to keep data from swapping between ArrayLists and Prioritized Queues. See bug: https://issues.apache.org/jira/browse/NIFI-7583 You can also adjust that limit up in the nifi.properties. On Sat, Sep 12, 2020 at 1:15 AM Chris Sampson wrote: > One thing we've not

Re: Best way to tune NiFi for huge amounts of small flowfiles

2020-09-11 Thread Chris Sampson
One thing we've not done yet but I think might help is to stripe disks for each repo too, i.e. multiple disks for content, etc., which will help spread the disk I/O. Cheers, Chris Sampson On Fri, 11 Sep 2020, 22:46 Mike Thomsen, wrote: > Craig and Jeremy, > > Thanks. The point about using

Re: Best way to tune NiFi for huge amounts of small flowfiles

2020-09-11 Thread Mike Thomsen
Craig and Jeremy, Thanks. The point about using different disks for different repositories is definitely something to add to the list. On Fri, Sep 11, 2020 at 3:11 PM Jeremy Dyer wrote: > > Hey Mike, > > When you say "flows that may drop in several million ... flowfiles" I read > that as a

Re: Best way to tune NiFi for huge amounts of small flowfiles

2020-09-11 Thread Jeremy Dyer
Hey Mike, When you say "flows that may drop in several million ... flowfiles" I read that as a single node that might be inundated with tons of source data (local files, ftp, kafka messages, etc). Just my 2 cents but if you don't have strict SLAs (and this kind of sounds like a 1 time thing) I

Re: Best way to tune NiFi for huge amounts of small flowfiles

2020-09-11 Thread Craig Connell
Hi Mike, I might have a few more pointers to offer when I can get unburied from some other work ... but the couple things that jump to mind are the following: - I think for that many flowfiles, you will want to make sure you have separate disks set up for data provenance. We have several