Hi James, Pierre, ListFile resets its state (including what is the latest entry it listed) when min file age is changed. ListFile.isListingResetNecessary implements the behavior.
Thanks, Koji On Tue, Jul 18, 2017 at 2:42 AM, Pierre Villard <[email protected]> wrote: > Hi James, > > This parameter should not change the behavior of the processor regarding > files already listed in previous trigger executions of the processor. Could > it be possible that old files have been somehow modified by another process? > That would explain why the processor listed the files one more time. If you > can reproduce the issue, that's certainly a bug IMO. > > Thanks > Pierre > > 2017-07-14 18:13 GMT+02:00 James McMahon <[email protected]>: >> >> Joe, I have a follow-up question. If I set Minimum File Age to 60 sec in >> my ListFile processor, that does not override the typical behavior in which >> ListFile does not include in its list output any file preceding its previous >> run cycle, does it? I ask because I set Minimum File Age to 60 sec and have >> seen a flood of additional files. Many of those files have date stamps >> preceding the ListFile runs that have been executing over the course of the >> last few days. I am trying to determine why this might be the case. >> >> Thanks for any thoughts or insights. >> >> On Fri, Jul 14, 2017 at 10:11 AM, James McMahon <[email protected]> >> wrote: >>> >>> Thank you very much Joe. This is very good to know. We are indeed working >>> with CentOS, and so I can explore with my users using a '.' prefix while >>> working with the file, and renaming it when done. >>> >>> But I'm not certain I can levy that requirement on my users, or depend on >>> them to always enforce it. So in combination with that I will use a Minimum >>> File Age of 30 or 60 seconds in my ListFile processors. That should be more >>> than ample margin, and since my ListFile runs with the default Yield >>> duration of 1 sec, the files will be picked up in a subsequent processor run >>> quite rapidly. Thanks again. >>> >>> On Fri, Jul 14, 2017 at 9:53 AM, Joe Witt <[email protected]> wrote: >>>> >>>> Jim, >>>> >>>> Ultimately this comes down to whether any consuming process (not just >>>> NiFi) can reliably know that a given file is 'ready to be consumed'. >>>> If the writer of those files offers no 'protocol' by which you can >>>> know then unfortunately it is about 'having a pretty good guess' that >>>> they're done. >>>> >>>> One of the simpler and more reliable ways to know the file writer is >>>> done is that the file writer changes the name of the file when it is >>>> done. Most common pattern here is they write the file with a name >>>> prepended with a 'dot'. In *nix this is often considered a 'hidden' >>>> file. The ListFile processor does this by default. >>>> >>>> After that it is a set of less awesome options. The next most >>>> reliable option most likely is to use the file age (based on >>>> modification time) and ListFile makes this available to you. The >>>> problem with this is that you're not guaranteed it will be updated and >>>> administrators of systems can disable updates to modification time if >>>> they want to. However, if in your case this is a reliable option you >>>> could use that. >>>> >>>> We could also add something to the processor to make listings slower >>>> whereby it would scan a couple times to see if the file size is still >>>> changing. But this is also not very reliable. >>>> >>>> In short, the processor gives you options to handle this but you also >>>> have to keep in mind that unless there is some reliable 'protocol' >>>> here you are basically guessing at whether the file is ready. This is >>>> a 'how file IO works' thing more than a what a NiFi can do thing. >>>> >>>> Thanks >>>> Joe >>>> >>>> >>>> On Fri, Jul 14, 2017 at 9:39 AM, James McMahon <[email protected]> >>>> wrote: >>>> > A fundamental question was asked by one of the consumers who depend on >>>> > my >>>> > NiFi workflows for transfer of critical data. I wasn't entirely >>>> > certain of >>>> > the answer and feel that I really should better understand this. >>>> > >>>> > When using a ListFile/FetchFile combo or even a simple GetFile >>>> > processor, >>>> > how does Nifi ensure that it does not ingest from a targeted directory >>>> > any >>>> > files that an external process may still be writing to or editing? Is >>>> > it >>>> > bound by file locks that have been established at the system level by >>>> > those >>>> > external processes? >>>> > >>>> > Thanks in advance for any insights that help me better explain this to >>>> > consumers of my NiFi workflows. -Jim >>> >>> >> >
