We keep our queue limit at 20,000 to keep data from swapping between ArrayLists and Prioritized Queues. See bug: https://issues.apache.org/jira/browse/NIFI-7583
You can also adjust that limit up in the nifi.properties. On Sat, Sep 12, 2020 at 1:15 AM Chris Sampson <chris.samp...@naimuri.com> wrote: > One thing we've not done yet but I think might help is to stripe disks for > each repo too, i.e. multiple disks for content, etc., which will help > spread the disk I/O. > > > Cheers, > > Chris Sampson > > On Fri, 11 Sep 2020, 22:46 Mike Thomsen, <mikerthom...@gmail.com> wrote: > >> Craig and Jeremy, >> >> Thanks. The point about using different disks for different >> repositories is definitely something to add to the list. >> >> On Fri, Sep 11, 2020 at 3:11 PM Jeremy Dyer <jdy...@gmail.com> wrote: >> > >> > Hey Mike, >> > >> > When you say "flows that may drop in several million ... flowfiles" I >> read that as a single node that might be inundated with tons of source data >> (local files, ftp, kafka messages, etc). Just my 2 cents but if you don't >> have strict SLAs (and this kind of sounds like a 1 time thing) I wouldn't >> even worry about it and just let the system back pressure and process in >> time as designed. That process will be "safe" although maybe not fast. If >> you need speed throw lots of NVMe mounts at it. We process well into the >> tens (sometimes hundreds) of millions of flowfiles a day on a 5 node >> cluster with no issues. However our hardware is quite over the top. >> > >> > Thanks, >> > Jeremy Dyer >> > >> > On Fri, Sep 11, 2020 at 12:51 PM Mike Thomsen <mikerthom...@gmail.com> >> wrote: >> >> >> >> What are the general recommended practices around tuning NiFi to >> >> safely handle flows that may drop in several million very small >> >> flowfiles (2k-10kb each) onto a single node? It's possible that some >> >> of the data dumps we're processing (and we can't control their size) >> >> will drop about 3.5-5M flowfiles the moment we expand them in the >> >> flow. >> >> >> >> (Let me emphasize again, it was not our idea to dump the data this way) >> >> >> >> Any pointers would be appreciated. >> >> >> >> Thanks, >> >> >> >> Mike >> >