I am using ListFile followed by FetchFile to recurse and detect new files that show up in a large nested directory structure that grows over time. I need to better understand how this approach scales. What are the practical and the performance limitations to using this tandem of processors for feeding new files to NiFi? If anyone has used this approach in a large-scale data environment to manage new content to NiFi, I would welcome your thoughts.
Where does ListFile maintain its list of files with datetime stamps? Does this get persisted as a hash map in memory? Is it also persisted into one of the NiFi repositories as a backup? My concern is avoiding having to reprocess the entire directory structure should that list ever get lost or destroyed. Thank you in advance once again for your assistance. -Jim
