Forgot to put a link to the implementation: https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ListFile.java#L319
MIN_AGE is not the only property, ListFile resets state if these properties are changed. So, after reconfiguring these, you may get the same file listed. return DIRECTORY.equals(property) || RECURSE.equals(property) || FILE_FILTER.equals(property) || PATH_FILTER.equals(property) || MIN_AGE.equals(property) || MAX_AGE.equals(property) || MIN_SIZE.equals(property) || MAX_SIZE.equals(property) || IGNORE_HIDDEN_FILES.equals(property); On Tue, Jul 18, 2017 at 10:32 AM, Koji Kawamura <[email protected]> wrote: > Hi James, Pierre, > > ListFile resets its state (including what is the latest entry it > listed) when min file age is changed. ListFile.isListingResetNecessary > implements the behavior. > > Thanks, > Koji > > On Tue, Jul 18, 2017 at 2:42 AM, Pierre Villard > <[email protected]> wrote: >> Hi James, >> >> This parameter should not change the behavior of the processor regarding >> files already listed in previous trigger executions of the processor. Could >> it be possible that old files have been somehow modified by another process? >> That would explain why the processor listed the files one more time. If you >> can reproduce the issue, that's certainly a bug IMO. >> >> Thanks >> Pierre >> >> 2017-07-14 18:13 GMT+02:00 James McMahon <[email protected]>: >>> >>> Joe, I have a follow-up question. If I set Minimum File Age to 60 sec in >>> my ListFile processor, that does not override the typical behavior in which >>> ListFile does not include in its list output any file preceding its previous >>> run cycle, does it? I ask because I set Minimum File Age to 60 sec and have >>> seen a flood of additional files. Many of those files have date stamps >>> preceding the ListFile runs that have been executing over the course of the >>> last few days. I am trying to determine why this might be the case. >>> >>> Thanks for any thoughts or insights. >>> >>> On Fri, Jul 14, 2017 at 10:11 AM, James McMahon <[email protected]> >>> wrote: >>>> >>>> Thank you very much Joe. This is very good to know. We are indeed working >>>> with CentOS, and so I can explore with my users using a '.' prefix while >>>> working with the file, and renaming it when done. >>>> >>>> But I'm not certain I can levy that requirement on my users, or depend on >>>> them to always enforce it. So in combination with that I will use a Minimum >>>> File Age of 30 or 60 seconds in my ListFile processors. That should be more >>>> than ample margin, and since my ListFile runs with the default Yield >>>> duration of 1 sec, the files will be picked up in a subsequent processor >>>> run >>>> quite rapidly. Thanks again. >>>> >>>> On Fri, Jul 14, 2017 at 9:53 AM, Joe Witt <[email protected]> wrote: >>>>> >>>>> Jim, >>>>> >>>>> Ultimately this comes down to whether any consuming process (not just >>>>> NiFi) can reliably know that a given file is 'ready to be consumed'. >>>>> If the writer of those files offers no 'protocol' by which you can >>>>> know then unfortunately it is about 'having a pretty good guess' that >>>>> they're done. >>>>> >>>>> One of the simpler and more reliable ways to know the file writer is >>>>> done is that the file writer changes the name of the file when it is >>>>> done. Most common pattern here is they write the file with a name >>>>> prepended with a 'dot'. In *nix this is often considered a 'hidden' >>>>> file. The ListFile processor does this by default. >>>>> >>>>> After that it is a set of less awesome options. The next most >>>>> reliable option most likely is to use the file age (based on >>>>> modification time) and ListFile makes this available to you. The >>>>> problem with this is that you're not guaranteed it will be updated and >>>>> administrators of systems can disable updates to modification time if >>>>> they want to. However, if in your case this is a reliable option you >>>>> could use that. >>>>> >>>>> We could also add something to the processor to make listings slower >>>>> whereby it would scan a couple times to see if the file size is still >>>>> changing. But this is also not very reliable. >>>>> >>>>> In short, the processor gives you options to handle this but you also >>>>> have to keep in mind that unless there is some reliable 'protocol' >>>>> here you are basically guessing at whether the file is ready. This is >>>>> a 'how file IO works' thing more than a what a NiFi can do thing. >>>>> >>>>> Thanks >>>>> Joe >>>>> >>>>> >>>>> On Fri, Jul 14, 2017 at 9:39 AM, James McMahon <[email protected]> >>>>> wrote: >>>>> > A fundamental question was asked by one of the consumers who depend on >>>>> > my >>>>> > NiFi workflows for transfer of critical data. I wasn't entirely >>>>> > certain of >>>>> > the answer and feel that I really should better understand this. >>>>> > >>>>> > When using a ListFile/FetchFile combo or even a simple GetFile >>>>> > processor, >>>>> > how does Nifi ensure that it does not ingest from a targeted directory >>>>> > any >>>>> > files that an external process may still be writing to or editing? Is >>>>> > it >>>>> > bound by file locks that have been established at the system level by >>>>> > those >>>>> > external processes? >>>>> > >>>>> > Thanks in advance for any insights that help me better explain this to >>>>> > consumers of my NiFi workflows. -Jim >>>> >>>> >>> >>
