Re: stream one large file, only once

Andrew Grande Mon, 14 Nov 2016 05:12:07 -0800

Neither GetFile nor FetchFile read the file into memory, they only deal
with the file handle and pass the contents via a handle to the content
repository (NiFi streams data into and reads as a stream).

What you will face, however, is an issue with a SplitText when you try to
split it in 1 transaction. This might fail based on the JVM heap allocated
and file size. A recommended best practice in this case is to introduce a
series of 2 SplitText processors. 1st pass would split into e.g. 10 000 row
chunks, 2nd - into individual. Adjust for your expected file sizes and
available memory.

HTH,
Andrew

On Mon, Nov 14, 2016 at 7:23 AM Raf Huys <[email protected]> wrote:

> I would like to read in a large (several gigs) of logdata, and route every
> line to a (potentially different) Kafka topic.
>
> - I don't want this file to be in memory
> - I want it to be read once, not more
>
> using `GetFile` takes the whole file in memory. Same with `FetchFile` as
> far as I can see.
>
> I also used a `ExecuteProcess` processor in which the file is `cat` and
> which splits off a flowfile every millisecond. This looked to be a somewhat
> streaming approach to the problem, but this processor runs continuously (or
> cron based) and by consequence the logfile is re-injected all the time.
>
> What's the typical Nifi for this? Tx
>
> Raf Huys
>

Re: stream one large file, only once

Reply via email to