GetFile has no persistence.
Actually it has, but it's called your hard drive. :)

If you take a look at the documentation:
*Keep Source File - *"If true, the file is not deleted after it has been
copied to the Content Repository; this causes the file to be picked up
continually and is useful for testing purposes. If not keeping original
NiFi will need write permissions on the directory it is pulling from
otherwise it will ignore the file."

You can see that it's going to get the same files over and over again
unless you configure it to delete the already processed ones.

The reason I suggested the combination above is that listfile can be
triggered once, the metadata (filenames) are stored in your queue and
fetchfile can process them later.

On Thu, Feb 18, 2021 at 2:39 PM Jean-Sebastien Vachon <
jsvac...@brizodata.com> wrote:

> OK I understand your point.. sorry (early morning) 😉
>
> I am kind of stuck with the GetFile processor for now. Is there a way to
> know how many files are left to process?
>
> Will it go forever? or will it stops streaming once all files have been
> processed? (there are no new files in the folder... everything was there at
> the beginning)
>
> Thanks
>
>
> *Jean-Sébastien Vachon *
> Co-Founder & Architect
>
>
> *Brizo Data, Inc. www.brizodata.com
> <https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>
> *
> ------------------------------
> *From:* Jean-Sebastien Vachon <jsvac...@brizodata.com>
> *Sent:* Thursday, February 18, 2021 8:34 AM
> *To:* users@nifi.apache.org <users@nifi.apache.org>
> *Subject:* Re: Questions about the GetFile processor
>
> Thanks for your comment. However, I can't queue everything as the total
> size of the data is around 560GB.
> Right now, I am using a GetFile processor and it has been running for a
> few days. If I look at my end point, it looks like it should be done pretty
> soon but data is still
> streaming in at the same rate so I was wondering if the processor
> remembers every single file it has already processed or if it is simply
> going through all the files alphabetically or in whatever order it decides.
>
> Thanks
>
>
> *Jean-Sébastien Vachon *
> Co-Founder & Architect
>
>
> *Brizo Data, Inc. www.brizodata.com
> <https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>
> *
> ------------------------------
> *From:* Arpad Boda <ab...@apache.org>
> *Sent:* Thursday, February 18, 2021 8:29 AM
> *To:* users@nifi.apache.org <users@nifi.apache.org>
> *Subject:* Re: Questions about the GetFile processor
>
> You can use the combination of listfile and fetchfile.
> In the queue between the two you are going to see the number of
> (flow)files left to be processed.
>
> On Thu, Feb 18, 2021 at 2:14 PM Jean-Sebastien Vachon <
> jsvac...@brizodata.com> wrote:
>
> Hi all,
>
> If I configure a GetFile processor to list all JSON files under a given
> folder, will it stops sending flows once it has processed all files?
> My folder contains thousands of files and the processor reads them by
> small batch (10) every 30s.
>
> Is there a way to know how many files are left to processed?
>
> Thanks
>
>
> *Jean-Sébastien Vachon *
> Co-Founder & Architect
>
>
> *Brizo Data, Inc. www.brizodata.com
> <https://outlook.office365.com/mail/options/mail/messageContent/www.brizodata.com>
> *
>
>

Reply via email to