Neither GetFile nor FetchFile read the file into memory, they only deal with the file handle and pass the contents via a handle to the content repository (NiFi streams data into and reads as a stream).
What you will face, however, is an issue with a SplitText when you try to split it in 1 transaction. This might fail based on the JVM heap allocated and file size. A recommended best practice in this case is to introduce a series of 2 SplitText processors. 1st pass would split into e.g. 10 000 row chunks, 2nd - into individual. Adjust for your expected file sizes and available memory. HTH, Andrew On Mon, Nov 14, 2016 at 7:23 AM Raf Huys <[email protected]> wrote: > I would like to read in a large (several gigs) of logdata, and route every > line to a (potentially different) Kafka topic. > > - I don't want this file to be in memory > - I want it to be read once, not more > > using `GetFile` takes the whole file in memory. Same with `FetchFile` as > far as I can see. > > I also used a `ExecuteProcess` processor in which the file is `cat` and > which splits off a flowfile every millisecond. This looked to be a somewhat > streaming approach to the problem, but this processor runs continuously (or > cron based) and by consequence the logfile is re-injected all the time. > > What's the typical Nifi for this? Tx > > Raf Huys >
