Thank you very much Joe. This is very good to know. We are indeed working with CentOS, and so I can explore with my users using a '.' prefix while working with the file, and renaming it when done.
But I'm not certain I can levy that requirement on my users, or depend on them to always enforce it. So in combination with that I will use a Minimum File Age of 30 or 60 seconds in my ListFile processors. That should be more than ample margin, and since my ListFile runs with the default Yield duration of 1 sec, the files will be picked up in a subsequent processor run quite rapidly. Thanks again. On Fri, Jul 14, 2017 at 9:53 AM, Joe Witt <[email protected]> wrote: > Jim, > > Ultimately this comes down to whether any consuming process (not just > NiFi) can reliably know that a given file is 'ready to be consumed'. > If the writer of those files offers no 'protocol' by which you can > know then unfortunately it is about 'having a pretty good guess' that > they're done. > > One of the simpler and more reliable ways to know the file writer is > done is that the file writer changes the name of the file when it is > done. Most common pattern here is they write the file with a name > prepended with a 'dot'. In *nix this is often considered a 'hidden' > file. The ListFile processor does this by default. > > After that it is a set of less awesome options. The next most > reliable option most likely is to use the file age (based on > modification time) and ListFile makes this available to you. The > problem with this is that you're not guaranteed it will be updated and > administrators of systems can disable updates to modification time if > they want to. However, if in your case this is a reliable option you > could use that. > > We could also add something to the processor to make listings slower > whereby it would scan a couple times to see if the file size is still > changing. But this is also not very reliable. > > In short, the processor gives you options to handle this but you also > have to keep in mind that unless there is some reliable 'protocol' > here you are basically guessing at whether the file is ready. This is > a 'how file IO works' thing more than a what a NiFi can do thing. > > Thanks > Joe > > > On Fri, Jul 14, 2017 at 9:39 AM, James McMahon <[email protected]> > wrote: > > A fundamental question was asked by one of the consumers who depend on my > > NiFi workflows for transfer of critical data. I wasn't entirely certain > of > > the answer and feel that I really should better understand this. > > > > When using a ListFile/FetchFile combo or even a simple GetFile processor, > > how does Nifi ensure that it does not ingest from a targeted directory > any > > files that an external process may still be writing to or editing? Is it > > bound by file locks that have been established at the system level by > those > > external processes? > > > > Thanks in advance for any insights that help me better explain this to > > consumers of my NiFi workflows. -Jim >
