Thank you very much Joe. This is very good to know. We are indeed working
with CentOS, and so I can explore with my users using a '.' prefix while
working with the file, and renaming it when done.

But I'm not certain I can levy that requirement on my users, or depend on
them to always enforce it. So in combination with that I will use a Minimum
File Age of 30 or 60 seconds in my ListFile processors. That should be more
than ample margin, and since my ListFile runs with the default Yield
duration of 1 sec, the files will be picked up in a subsequent processor
run quite rapidly. Thanks again.

On Fri, Jul 14, 2017 at 9:53 AM, Joe Witt <[email protected]> wrote:

> Jim,
>
> Ultimately this comes down to whether any consuming process (not just
> NiFi) can reliably know that a given file is 'ready to be consumed'.
> If the writer of those files offers no 'protocol' by which you can
> know then unfortunately it is about 'having a pretty good guess' that
> they're done.
>
> One of the simpler and more reliable ways to know the file writer is
> done is that the file writer changes the name of the file when it is
> done.  Most common pattern here is they write the file with a name
> prepended with a 'dot'.  In *nix this is often considered a 'hidden'
> file.  The ListFile processor does this by default.
>
> After that it is a set of less awesome options.  The next most
> reliable option most likely is to use the file age (based on
> modification time) and ListFile makes this available to you.  The
> problem with this is that you're not guaranteed it will be updated and
> administrators of systems can disable updates to modification time if
> they want to.  However, if in your case this is a reliable option you
> could use that.
>
> We could also add something to the processor to make listings slower
> whereby it would scan a couple times to see if the file size is still
> changing.  But this is also not very reliable.
>
> In short, the processor gives you options to handle this but you also
> have to keep in mind that unless there is some reliable 'protocol'
> here you are basically guessing at whether the file is ready.  This is
> a 'how file IO works' thing more than a what a NiFi can do thing.
>
> Thanks
> Joe
>
>
> On Fri, Jul 14, 2017 at 9:39 AM, James McMahon <[email protected]>
> wrote:
> > A fundamental question was asked by one of the consumers who depend on my
> > NiFi workflows for transfer of critical data. I wasn't entirely certain
> of
> > the answer and feel that I really should better understand this.
> >
> > When using a ListFile/FetchFile combo or even a simple GetFile processor,
> > how does Nifi ensure that it does not ingest from a targeted directory
> any
> > files that an external process may still be writing to or editing? Is it
> > bound by file locks that have been established at the system level by
> those
> > external processes?
> >
> > Thanks in advance for any insights that help me better explain this to
> > consumers of my NiFi workflows.  -Jim
>

Reply via email to