Hi Jim,

ListFile does not maintain a list of files w/ datetime stamps. Instead, it 
store just two timestamps:
the timestamp of when a listing was last performed, and the timestamp of the 
newest file that it has
sent out. This is done precisely because we need it to be able to scale as the 
input becomes large.

The location of where this information is stored depends on a couple of things. 
ListFile has a property named
"Input Directory Location." If that is set to "Remote" and the NiFi instance is 
clustered, then this information is
stored in ZooKeeper. This allows the Processor to run on Primary Node only and 
if a new node is elected Primary,
then it is able to pick up where the previous Primary Node left off.

if the Input Directory Location is set to "Local" (or if NiFi is not clustered) 
then the state will be stored to the Local
State manager, which is backed by a write-ahead log. By default it is written 
to ./state/local but this can be configured
in the conf/state-management.xml. So if you want to be really sure that you 
don't lose the information, you could
potentially change the location to some place that has a RAID configuration for 
redundancy.

Thanks
-Mark


> On Jan 10, 2017, at 8:38 AM, James McMahon <[email protected]> wrote:
> 
> I am using ListFile followed by FetchFile to recurse and detect new files 
> that show up in a large nested directory structure that grows over time. I 
> need to better understand how this approach scales. What are the practical 
> and the performance limitations to using this tandem of processors for 
> feeding new files to NiFi? If anyone has used this approach in a large-scale 
> data environment to manage new content to NiFi, I would welcome your thoughts.
> 
> Where does ListFile maintain its list of files with datetime stamps? Does 
> this get persisted as a hash map in memory? Is it also persisted into one of 
> the NiFi repositories as a backup? My concern is avoiding having to reprocess 
> the entire directory structure should that list ever get lost or destroyed. 
> 
> Thank you in advance once again for your assistance. -Jim

Reply via email to