Jim, Ultimately this comes down to whether any consuming process (not just NiFi) can reliably know that a given file is 'ready to be consumed'. If the writer of those files offers no 'protocol' by which you can know then unfortunately it is about 'having a pretty good guess' that they're done.
One of the simpler and more reliable ways to know the file writer is done is that the file writer changes the name of the file when it is done. Most common pattern here is they write the file with a name prepended with a 'dot'. In *nix this is often considered a 'hidden' file. The ListFile processor does this by default. After that it is a set of less awesome options. The next most reliable option most likely is to use the file age (based on modification time) and ListFile makes this available to you. The problem with this is that you're not guaranteed it will be updated and administrators of systems can disable updates to modification time if they want to. However, if in your case this is a reliable option you could use that. We could also add something to the processor to make listings slower whereby it would scan a couple times to see if the file size is still changing. But this is also not very reliable. In short, the processor gives you options to handle this but you also have to keep in mind that unless there is some reliable 'protocol' here you are basically guessing at whether the file is ready. This is a 'how file IO works' thing more than a what a NiFi can do thing. Thanks Joe On Fri, Jul 14, 2017 at 9:39 AM, James McMahon <[email protected]> wrote: > A fundamental question was asked by one of the consumers who depend on my > NiFi workflows for transfer of critical data. I wasn't entirely certain of > the answer and feel that I really should better understand this. > > When using a ListFile/FetchFile combo or even a simple GetFile processor, > how does Nifi ensure that it does not ingest from a targeted directory any > files that an external process may still be writing to or editing? Is it > bound by file locks that have been established at the system level by those > external processes? > > Thanks in advance for any insights that help me better explain this to > consumers of my NiFi workflows. -Jim
