One thing we've not done yet but I think might help is to stripe disks for
each repo too, i.e. multiple disks for content, etc., which will help
spread the disk I/O.


Cheers,

Chris Sampson

On Fri, 11 Sep 2020, 22:46 Mike Thomsen, <[email protected]> wrote:

> Craig and Jeremy,
>
> Thanks. The point about using different disks for different
> repositories is definitely something to add to the list.
>
> On Fri, Sep 11, 2020 at 3:11 PM Jeremy Dyer <[email protected]> wrote:
> >
> > Hey Mike,
> >
> > When you say "flows that may drop in several million ... flowfiles" I
> read that as a single node that might be inundated with tons of source data
> (local files, ftp, kafka messages, etc). Just my 2 cents but if you don't
> have strict SLAs (and this kind of sounds like a 1 time thing) I wouldn't
> even worry about it and just let the system back pressure and process in
> time as designed. That process will be "safe" although maybe not fast. If
> you need speed throw lots of NVMe mounts at it. We process well into the
> tens (sometimes hundreds) of millions of flowfiles a day on a 5 node
> cluster with no issues. However our hardware is quite over the top.
> >
> > Thanks,
> > Jeremy Dyer
> >
> > On Fri, Sep 11, 2020 at 12:51 PM Mike Thomsen <[email protected]>
> wrote:
> >>
> >> What are the general recommended practices around tuning NiFi to
> >> safely handle flows that may drop in several million very small
> >> flowfiles (2k-10kb each) onto a single node? It's possible that some
> >> of the data dumps we're processing (and we can't control their size)
> >> will drop about 3.5-5M flowfiles the moment we expand them in the
> >> flow.
> >>
> >> (Let me emphasize again, it was not our idea to dump the data this way)
> >>
> >> Any pointers would be appreciated.
> >>
> >> Thanks,
> >>
> >> Mike
>

Reply via email to