Thank you Joe. This sounds promising and I will try to apply the example.
Can you provide the link to what you refer to as our wiki? I'll search for
the example.

On Tue, Apr 14, 2020 at 8:08 AM Joe Witt <[email protected]> wrote:

> James
>
> Using the provenance events from this processor is the best way.  Grab all
> receive events for the time period of interest.
>
> You can do this in a few ways but one that works well is to send prov
> events via reporting task, filter events for that component, write those
> out to a file or set of files and review.  I think we have an example of
> this on our wiki.
>
> Thanks
>
> On Tue, Apr 14, 2020 at 7:57 AM James McMahon <[email protected]>
> wrote:
>
>> I have an issue with a ListFile processor. It does not appear to be
>> consuming all the raw data files that show up throughout the day in a
>> landing directory. My count at end of the day is less than the count of all
>> the files in the directory at end of the day. I suspect it has to do with
>> the way the ListFile has been configured (right now we only accept files
>> that are 30 minutes old or older), or it has to do with the fact that large
>> multiples of file can arrive at the same hh:mm differentiated by seconds or
>> milliseconds.  Perhaps ListFile is recording its state only to the
>> hour-minute or hour-minute-second (I notice that all millisecond values in
>> the epoch time are 000 in View State), and so when ListFile runs in its
>> following cycle it overlooks all the other files that share hh:mm, but are
>> later in time by some seconds or milliseconds on the file time? I'm
>> grasping for a logical cause at this point.
>>
>> I want to do a comparison of what I have read in so far today against an
>> exhaustive list of today's directory. My intention is that such a
>> comparison should flag gaps, which then may lead me to a cause.
>>
>> I have saved to a queue that persists the results of ListFile Success
>> path for 24 hours, which I started after all files yesterday had stopped
>> arriving (point being, queue will only have flowfiles in it from the today
>> directory). Right now it totals 16,231 flowfiles. The "read only" directory
>> on the linux system has nearly 20,000 files in it. Looking at the queue
>> from the UI isn't quite what I require: it only lets me view 100 flowfiles,
>> and I can't output the list.
>>
>> Can I use the API or other option to generate the complete list of
>> flowfiles in that queue? I hope to output a list that includes Filename,
>> file.lastModifiedTime, and file.creationTime .
>> Thank you in advance for your help.
>>
>>
>>

Reply via email to