Hi James, Pierre,

ListFile resets its state (including what is the latest entry it
listed) when min file age is changed. ListFile.isListingResetNecessary
implements the behavior.

Thanks,
Koji

On Tue, Jul 18, 2017 at 2:42 AM, Pierre Villard
<[email protected]> wrote:
> Hi James,
>
> This parameter should not change the behavior of the processor regarding
> files already listed in previous trigger executions of the processor. Could
> it be possible that old files have been somehow modified by another process?
> That would explain why the processor listed the files one more time. If you
> can reproduce the issue, that's certainly a bug IMO.
>
> Thanks
> Pierre
>
> 2017-07-14 18:13 GMT+02:00 James McMahon <[email protected]>:
>>
>> Joe, I have a follow-up question. If I set Minimum File Age to 60 sec in
>> my ListFile processor, that does not override the typical behavior in which
>> ListFile does not include in its list output any file preceding its previous
>> run cycle, does it? I ask because I set Minimum File Age to 60 sec and have
>> seen a flood of additional files. Many of those files have date stamps
>> preceding the ListFile runs that have been executing over the course of the
>> last few days. I am trying to determine why this might be the case.
>>
>> Thanks for any thoughts or insights.
>>
>> On Fri, Jul 14, 2017 at 10:11 AM, James McMahon <[email protected]>
>> wrote:
>>>
>>> Thank you very much Joe. This is very good to know. We are indeed working
>>> with CentOS, and so I can explore with my users using a '.' prefix while
>>> working with the file, and renaming it when done.
>>>
>>> But I'm not certain I can levy that requirement on my users, or depend on
>>> them to always enforce it. So in combination with that I will use a Minimum
>>> File Age of 30 or 60 seconds in my ListFile processors. That should be more
>>> than ample margin, and since my ListFile runs with the default Yield
>>> duration of 1 sec, the files will be picked up in a subsequent processor run
>>> quite rapidly. Thanks again.
>>>
>>> On Fri, Jul 14, 2017 at 9:53 AM, Joe Witt <[email protected]> wrote:
>>>>
>>>> Jim,
>>>>
>>>> Ultimately this comes down to whether any consuming process (not just
>>>> NiFi) can reliably know that a given file is 'ready to be consumed'.
>>>> If the writer of those files offers no 'protocol' by which you can
>>>> know then unfortunately it is about 'having a pretty good guess' that
>>>> they're done.
>>>>
>>>> One of the simpler and more reliable ways to know the file writer is
>>>> done is that the file writer changes the name of the file when it is
>>>> done.  Most common pattern here is they write the file with a name
>>>> prepended with a 'dot'.  In *nix this is often considered a 'hidden'
>>>> file.  The ListFile processor does this by default.
>>>>
>>>> After that it is a set of less awesome options.  The next most
>>>> reliable option most likely is to use the file age (based on
>>>> modification time) and ListFile makes this available to you.  The
>>>> problem with this is that you're not guaranteed it will be updated and
>>>> administrators of systems can disable updates to modification time if
>>>> they want to.  However, if in your case this is a reliable option you
>>>> could use that.
>>>>
>>>> We could also add something to the processor to make listings slower
>>>> whereby it would scan a couple times to see if the file size is still
>>>> changing.  But this is also not very reliable.
>>>>
>>>> In short, the processor gives you options to handle this but you also
>>>> have to keep in mind that unless there is some reliable 'protocol'
>>>> here you are basically guessing at whether the file is ready.  This is
>>>> a 'how file IO works' thing more than a what a NiFi can do thing.
>>>>
>>>> Thanks
>>>> Joe
>>>>
>>>>
>>>> On Fri, Jul 14, 2017 at 9:39 AM, James McMahon <[email protected]>
>>>> wrote:
>>>> > A fundamental question was asked by one of the consumers who depend on
>>>> > my
>>>> > NiFi workflows for transfer of critical data. I wasn't entirely
>>>> > certain of
>>>> > the answer and feel that I really should better understand this.
>>>> >
>>>> > When using a ListFile/FetchFile combo or even a simple GetFile
>>>> > processor,
>>>> > how does Nifi ensure that it does not ingest from a targeted directory
>>>> > any
>>>> > files that an external process may still be writing to or editing? Is
>>>> > it
>>>> > bound by file locks that have been established at the system level by
>>>> > those
>>>> > external processes?
>>>> >
>>>> > Thanks in advance for any insights that help me better explain this to
>>>> > consumers of my NiFi workflows.  -Jim
>>>
>>>
>>
>

Reply via email to