One thing we've not done yet but I think might help is to stripe disks for each repo too, i.e. multiple disks for content, etc., which will help spread the disk I/O.
Cheers, Chris Sampson On Fri, 11 Sep 2020, 22:46 Mike Thomsen, <[email protected]> wrote: > Craig and Jeremy, > > Thanks. The point about using different disks for different > repositories is definitely something to add to the list. > > On Fri, Sep 11, 2020 at 3:11 PM Jeremy Dyer <[email protected]> wrote: > > > > Hey Mike, > > > > When you say "flows that may drop in several million ... flowfiles" I > read that as a single node that might be inundated with tons of source data > (local files, ftp, kafka messages, etc). Just my 2 cents but if you don't > have strict SLAs (and this kind of sounds like a 1 time thing) I wouldn't > even worry about it and just let the system back pressure and process in > time as designed. That process will be "safe" although maybe not fast. If > you need speed throw lots of NVMe mounts at it. We process well into the > tens (sometimes hundreds) of millions of flowfiles a day on a 5 node > cluster with no issues. However our hardware is quite over the top. > > > > Thanks, > > Jeremy Dyer > > > > On Fri, Sep 11, 2020 at 12:51 PM Mike Thomsen <[email protected]> > wrote: > >> > >> What are the general recommended practices around tuning NiFi to > >> safely handle flows that may drop in several million very small > >> flowfiles (2k-10kb each) onto a single node? It's possible that some > >> of the data dumps we're processing (and we can't control their size) > >> will drop about 3.5-5M flowfiles the moment we expand them in the > >> flow. > >> > >> (Let me emphasize again, it was not our idea to dump the data this way) > >> > >> Any pointers would be appreciated. > >> > >> Thanks, > >> > >> Mike >
