James Using the provenance events from this processor is the best way. Grab all receive events for the time period of interest.
You can do this in a few ways but one that works well is to send prov events via reporting task, filter events for that component, write those out to a file or set of files and review. I think we have an example of this on our wiki. Thanks On Tue, Apr 14, 2020 at 7:57 AM James McMahon <[email protected]> wrote: > I have an issue with a ListFile processor. It does not appear to be > consuming all the raw data files that show up throughout the day in a > landing directory. My count at end of the day is less than the count of all > the files in the directory at end of the day. I suspect it has to do with > the way the ListFile has been configured (right now we only accept files > that are 30 minutes old or older), or it has to do with the fact that large > multiples of file can arrive at the same hh:mm differentiated by seconds or > milliseconds. Perhaps ListFile is recording its state only to the > hour-minute or hour-minute-second (I notice that all millisecond values in > the epoch time are 000 in View State), and so when ListFile runs in its > following cycle it overlooks all the other files that share hh:mm, but are > later in time by some seconds or milliseconds on the file time? I'm > grasping for a logical cause at this point. > > I want to do a comparison of what I have read in so far today against an > exhaustive list of today's directory. My intention is that such a > comparison should flag gaps, which then may lead me to a cause. > > I have saved to a queue that persists the results of ListFile Success path > for 24 hours, which I started after all files yesterday had stopped > arriving (point being, queue will only have flowfiles in it from the today > directory). Right now it totals 16,231 flowfiles. The "read only" directory > on the linux system has nearly 20,000 files in it. Looking at the queue > from the UI isn't quite what I require: it only lets me view 100 flowfiles, > and I can't output the list. > > Can I use the API or other option to generate the complete list of > flowfiles in that queue? I hope to output a list that includes Filename, > file.lastModifiedTime, and file.creationTime . > Thank you in advance for your help. > > >
