Michael,

You mentioned that GetSFTP did not work, are you aware of FetchSFTP?
FetchSFTP will accept an incoming flowfile.  The typical NiFi pattern is
for a List* processor to feed into a Fetch* processor that accepts incoming
flowfiles, as opposed to Get* processors that originate flowfiles without
input.  It is not very obvious, I'm afraid.

Thanks,

James


On Fri, Jun 24, 2016 at 10:07 AM, Michael Dyer <[email protected]>
wrote:

> I'm looking for assistance in how to configure a set of processors to so
> that I only retrieve 'new' files:
>
> - A GetSFTP processor that executes on a daily basis.
> - The GetSFTP processor has read-only access to the remote site
> - Large (Multi-GB) files are added to the remote site daily.
> - Naming of the files is unpredictable.
> - Files are rotated (removed) from the site after approximately 1 week
>
> Currently, I'm having to transfer ALL of the files on a daily basis and
> then I use PutHDFS processor which ignores (discards) any duplicates.
> Having to re-transfer files I already have is very inefficient, especially
> given the large file sizes.
>
> Does anyone know of a pattern to:
>
> 1) Retrieve a list of files
> 2) Compare each file against HDFS and
> 3) Retrieve any 'missing' files?
>
> I tried building this with ListSFTP, but then ran into a problem that
> GetSFTP does not allow me to the ListSFTP results as an input.
>
> Thanks for the help!
>
> Michael
>
>

Reply via email to