I have a number of directories I'm monitoring with ListFile, working in 
Tracking Entities mode, backed by a DistributedMapCacheClientService. That 
works pretty well, except when NiFi gets restarted for some reason or another, 
in which case all the ListFiles return their entire directory again, which 
causes a lot of unnecessary reprocessing. So I'm changing the DMCServer to one 
backed by a persistence directory. This way, when NiFi restarts, all the 
ListFile processors pick up again as if nothing happened. But it's not working 
as I expect and I'm wondering if I don't understand what it's doing.

1. I can use mburgess's dcache.groovy 
script<https://community.cloudera.com/t5/Community-Articles/Working-with-a-NiFi-DistributedMapCache/ta-p/248370>
 to list and remove cache entries from the DMC. If I stop a particular 
ListFile, delete its cache entry, then restart it, it doesn't output any 
FlowFiles. I would expect it to list the directory contents again. After all, 
it's cache entry is gone, so it no longer has a history. Stopping and 
restarting the ClientService and Server have no effect. But if I restart NiFi 
as a whole, it does.

2. Similarly, if I have ListFile backed by a DMC with no persistence directory, 
change it to a cache that does have a persistence directory, then start 
ListFile, it outputs no FlowFiles. That means I've changed it from a cache with 
history to a different cache with no history, but it behaves like it does until 
I restart NiFi.

So maybe I'm not understanding how ListFile interacts with a 
DistributedMapCache?



Confidentiality Notice:
This message may contain confidential or privileged information, or information 
that is otherwise exempt from disclosure. If you are not the intended 
recipient, you should promptly delete it and should not disclose, copy or 
distribute it to others.

Reply via email to