Hi Nicholas, You need to configure your ListSFTP processor to only run on the primary node (scheduling strategy in processor configuration), then to send the flow files to a RPG that points to an input port in the cluster itself (so that flow files are distributed over the cluster and do not stay only on the primary node), then the FetchSFTP processor will take care of downloading the files. The ListSFTP, with its state (DistributedCache), ensures that you don't download the same file twice, and a given file won't be downloaded by two nodes at the same time.
Hope this helps, Pierre. 2016-12-15 22:13 GMT+01:00 Nicholas Hughes <[email protected]>: > I'm testing a simple List/Fetch setup on a 3 node cluster. I created a > DistributedMapCacheServer controller service with the default settings (no > SSL) and then created a DistributedMapCacheClientService that points at > one of the cluster hostnames. The ListSFTP processor is set to use the > Distributed Cache Service that I created. > > The ListSFTP processor lists the same 100 source files from the remote > system on each node, and sends 300 Flow Files downstream to the FetchSFTP > processor. I thought that the map cache allowed the cluster nodes to > determine which files had already been listed by other cluster nodes... > maybe I'm missing something. > > Any assistance is appreciated. > > NiFi version 1.0.0 in HDF 2.0.1 > > > -Nick > >
