Mike, Are the files a single coherent piece of information (i.e. a video file) or collections of smaller atomic units of data (i.e. CSV, JSON batches)? In the first case, it’s important to ensure that the processors which deal with the content do so in a streaming manner so as not to exhaust your heap (and ensure any customer processors you develop do the same), and and with the latter, when splitting and merging these records, we generally propose a two-step approach, where a single giant file is split into medium size flowfiles, and then each of these is split into individual records (i.e. 1 * 1MM -> 10 * 100K -> 10 * 100K * 1 as opposed to 1 * 1MM -> 1MM * 1).
Other than that, be sure to follow the best practices for configuration in the Admin Guide [1] and read about performance expectations [2]. [1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices [2] https://nifi.apache.org/docs/nifi-docs/html/overview.html#performance-expectations-and-characteristics-of-nifi Andy LoPresto [email protected] [email protected] PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Apr 7, 2017, at 5:26 AM, Mike Thomsen <[email protected]> wrote: > > I have one flow that will have to handle files that are anywhere from 500mb > to several GB in size. The current plan is to store the in HDFS or S3 and > then bring them down for processing in NiFi. Are there any suggestions on how > to handle such large single files? > > Thanks, > > Mike
signature.asc
Description: Message signed with OpenPGP using GPGMail
