Michael, You mentioned that GetSFTP did not work, are you aware of FetchSFTP? FetchSFTP will accept an incoming flowfile. The typical NiFi pattern is for a List* processor to feed into a Fetch* processor that accepts incoming flowfiles, as opposed to Get* processors that originate flowfiles without input. It is not very obvious, I'm afraid.
Thanks, James On Fri, Jun 24, 2016 at 10:07 AM, Michael Dyer <[email protected]> wrote: > I'm looking for assistance in how to configure a set of processors to so > that I only retrieve 'new' files: > > - A GetSFTP processor that executes on a daily basis. > - The GetSFTP processor has read-only access to the remote site > - Large (Multi-GB) files are added to the remote site daily. > - Naming of the files is unpredictable. > - Files are rotated (removed) from the site after approximately 1 week > > Currently, I'm having to transfer ALL of the files on a daily basis and > then I use PutHDFS processor which ignores (discards) any duplicates. > Having to re-transfer files I already have is very inefficient, especially > given the large file sizes. > > Does anyone know of a pattern to: > > 1) Retrieve a list of files > 2) Compare each file against HDFS and > 3) Retrieve any 'missing' files? > > I tried building this with ListSFTP, but then ran into a problem that > GetSFTP does not allow me to the ListSFTP results as an input. > > Thanks for the help! > > Michael > >
