Mike,

Are the files a single coherent piece of information (i.e. a video file) or 
collections of smaller atomic units of data (i.e. CSV, JSON batches)? In the 
first case, it’s important to ensure that the processors which deal with the 
content do so in a streaming manner so as not to exhaust your heap (and ensure 
any customer processors you develop do the same), and and with the latter, when 
splitting and merging these records, we generally propose a two-step approach, 
where a single giant file is split into medium size flowfiles, and then each of 
these is split into individual records (i.e. 1 * 1MM -> 10 * 100K -> 10 * 100K 
* 1 as opposed to 1 * 1MM -> 1MM * 1).

Other than that, be sure to follow the best practices for configuration in the 
Admin Guide [1] and read about performance expectations [2].

[1] 
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices
[2] 
https://nifi.apache.org/docs/nifi-docs/html/overview.html#performance-expectations-and-characteristics-of-nifi


Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Apr 7, 2017, at 5:26 AM, Mike Thomsen <[email protected]> wrote:
> 
> I have one flow that will have to handle files that are anywhere from 500mb 
> to several GB in size. The current plan is to store the in HDFS or S3 and 
> then bring them down for processing in NiFi. Are there any suggestions on how 
> to handle such large single files?
> 
> Thanks,
> 
> Mike

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to