Joe, I have a follow-up question. If I set Minimum File Age to 60 sec in my
ListFile processor, that does not override the typical behavior in which
ListFile does not include in its list output any file
preceding its previous run cycle, does it? I ask because I set Minimum File
Age to 60 sec and have seen a flood of additional files. Many of those
files have date stamps preceding the ListFile runs that have been executing
over the course of the last few days. I am trying to determine why
this might be the case.

Thanks for any thoughts or insights.

On Fri, Jul 14, 2017 at 10:11 AM, James McMahon <[email protected]>
wrote:

> Thank you very much Joe. This is very good to know. We are indeed working
> with CentOS, and so I can explore with my users using a '.' prefix while
> working with the file, and renaming it when done.
>
> But I'm not certain I can levy that requirement on my users, or depend on
> them to always enforce it. So in combination with that I will use a Minimum
> File Age of 30 or 60 seconds in my ListFile processors. That should be more
> than ample margin, and since my ListFile runs with the default Yield
> duration of 1 sec, the files will be picked up in a subsequent processor
> run quite rapidly. Thanks again.
>
> On Fri, Jul 14, 2017 at 9:53 AM, Joe Witt <[email protected]> wrote:
>
>> Jim,
>>
>> Ultimately this comes down to whether any consuming process (not just
>> NiFi) can reliably know that a given file is 'ready to be consumed'.
>> If the writer of those files offers no 'protocol' by which you can
>> know then unfortunately it is about 'having a pretty good guess' that
>> they're done.
>>
>> One of the simpler and more reliable ways to know the file writer is
>> done is that the file writer changes the name of the file when it is
>> done.  Most common pattern here is they write the file with a name
>> prepended with a 'dot'.  In *nix this is often considered a 'hidden'
>> file.  The ListFile processor does this by default.
>>
>> After that it is a set of less awesome options.  The next most
>> reliable option most likely is to use the file age (based on
>> modification time) and ListFile makes this available to you.  The
>> problem with this is that you're not guaranteed it will be updated and
>> administrators of systems can disable updates to modification time if
>> they want to.  However, if in your case this is a reliable option you
>> could use that.
>>
>> We could also add something to the processor to make listings slower
>> whereby it would scan a couple times to see if the file size is still
>> changing.  But this is also not very reliable.
>>
>> In short, the processor gives you options to handle this but you also
>> have to keep in mind that unless there is some reliable 'protocol'
>> here you are basically guessing at whether the file is ready.  This is
>> a 'how file IO works' thing more than a what a NiFi can do thing.
>>
>> Thanks
>> Joe
>>
>>
>> On Fri, Jul 14, 2017 at 9:39 AM, James McMahon <[email protected]>
>> wrote:
>> > A fundamental question was asked by one of the consumers who depend on
>> my
>> > NiFi workflows for transfer of critical data. I wasn't entirely certain
>> of
>> > the answer and feel that I really should better understand this.
>> >
>> > When using a ListFile/FetchFile combo or even a simple GetFile
>> processor,
>> > how does Nifi ensure that it does not ingest from a targeted directory
>> any
>> > files that an external process may still be writing to or editing? Is it
>> > bound by file locks that have been established at the system level by
>> those
>> > external processes?
>> >
>> > Thanks in advance for any insights that help me better explain this to
>> > consumers of my NiFi workflows.  -Jim
>>
>
>

Reply via email to