Jim,

Ultimately this comes down to whether any consuming process (not just
NiFi) can reliably know that a given file is 'ready to be consumed'.
If the writer of those files offers no 'protocol' by which you can
know then unfortunately it is about 'having a pretty good guess' that
they're done.

One of the simpler and more reliable ways to know the file writer is
done is that the file writer changes the name of the file when it is
done.  Most common pattern here is they write the file with a name
prepended with a 'dot'.  In *nix this is often considered a 'hidden'
file.  The ListFile processor does this by default.

After that it is a set of less awesome options.  The next most
reliable option most likely is to use the file age (based on
modification time) and ListFile makes this available to you.  The
problem with this is that you're not guaranteed it will be updated and
administrators of systems can disable updates to modification time if
they want to.  However, if in your case this is a reliable option you
could use that.

We could also add something to the processor to make listings slower
whereby it would scan a couple times to see if the file size is still
changing.  But this is also not very reliable.

In short, the processor gives you options to handle this but you also
have to keep in mind that unless there is some reliable 'protocol'
here you are basically guessing at whether the file is ready.  This is
a 'how file IO works' thing more than a what a NiFi can do thing.

Thanks
Joe


On Fri, Jul 14, 2017 at 9:39 AM, James McMahon <[email protected]> wrote:
> A fundamental question was asked by one of the consumers who depend on my
> NiFi workflows for transfer of critical data. I wasn't entirely certain of
> the answer and feel that I really should better understand this.
>
> When using a ListFile/FetchFile combo or even a simple GetFile processor,
> how does Nifi ensure that it does not ingest from a targeted directory any
> files that an external process may still be writing to or editing? Is it
> bound by file locks that have been established at the system level by those
> external processes?
>
> Thanks in advance for any insights that help me better explain this to
> consumers of my NiFi workflows.  -Jim

Reply via email to