Thank you Joe. This sounds promising and I will try to apply the example. Can you provide the link to what you refer to as our wiki? I'll search for the example.
On Tue, Apr 14, 2020 at 8:08 AM Joe Witt <[email protected]> wrote: > James > > Using the provenance events from this processor is the best way. Grab all > receive events for the time period of interest. > > You can do this in a few ways but one that works well is to send prov > events via reporting task, filter events for that component, write those > out to a file or set of files and review. I think we have an example of > this on our wiki. > > Thanks > > On Tue, Apr 14, 2020 at 7:57 AM James McMahon <[email protected]> > wrote: > >> I have an issue with a ListFile processor. It does not appear to be >> consuming all the raw data files that show up throughout the day in a >> landing directory. My count at end of the day is less than the count of all >> the files in the directory at end of the day. I suspect it has to do with >> the way the ListFile has been configured (right now we only accept files >> that are 30 minutes old or older), or it has to do with the fact that large >> multiples of file can arrive at the same hh:mm differentiated by seconds or >> milliseconds. Perhaps ListFile is recording its state only to the >> hour-minute or hour-minute-second (I notice that all millisecond values in >> the epoch time are 000 in View State), and so when ListFile runs in its >> following cycle it overlooks all the other files that share hh:mm, but are >> later in time by some seconds or milliseconds on the file time? I'm >> grasping for a logical cause at this point. >> >> I want to do a comparison of what I have read in so far today against an >> exhaustive list of today's directory. My intention is that such a >> comparison should flag gaps, which then may lead me to a cause. >> >> I have saved to a queue that persists the results of ListFile Success >> path for 24 hours, which I started after all files yesterday had stopped >> arriving (point being, queue will only have flowfiles in it from the today >> directory). Right now it totals 16,231 flowfiles. The "read only" directory >> on the linux system has nearly 20,000 files in it. Looking at the queue >> from the UI isn't quite what I require: it only lets me view 100 flowfiles, >> and I can't output the list. >> >> Can I use the API or other option to generate the complete list of >> flowfiles in that queue? I hope to output a list that includes Filename, >> file.lastModifiedTime, and file.creationTime . >> Thank you in advance for your help. >> >> >>
