Thanks, that's actually what I ended up doing. In case anyone comes along looking for this. The approach we used for development was:
GetFile -> SplitText (50k chunks) -> SplitText (1 line/flowfile) -> the rest On Fri, Apr 7, 2017 at 1:11 PM, Andy LoPresto <[email protected]> wrote: > Mike, > > Are the files a single coherent piece of information (i.e. a video file) > or collections of smaller atomic units of data (i.e. CSV, JSON batches)? In > the first case, it’s important to ensure that the processors which deal > with the content do so in a streaming manner so as not to exhaust your heap > (and ensure any customer processors you develop do the same), and and with > the latter, when splitting and merging these records, we generally propose > a two-step approach, where a single giant file is split into medium size > flowfiles, and then each of these is split into individual records (i.e. 1 > * 1MM -> 10 * 100K -> 10 * 100K * 1 as opposed to 1 * 1MM -> 1MM * 1). > > Other than that, be sure to follow the best practices for configuration in > the Admin Guide [1] and read about performance expectations [2]. > > [1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html# > configuration-best-practices > [2] https://nifi.apache.org/docs/nifi-docs/html/overview. > html#performance-expectations-and-characteristics-of-nifi > > > Andy LoPresto > [email protected] > *[email protected] <[email protected]>* > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > On Apr 7, 2017, at 5:26 AM, Mike Thomsen <[email protected]> wrote: > > I have one flow that will have to handle files that are anywhere from > 500mb to several GB in size. The current plan is to store the in HDFS or S3 > and then bring them down for processing in NiFi. Are there any suggestions > on how to handle such large single files? > > Thanks, > > Mike > > >
