Hi Michael,

ListSFTP is designed to work with FetchSFTP. I think those two processors
would give you what you want.

Sorry for the confusion, the "Get" processors were the original ones, and
then the list + fetch processors came later.

-Bryan


On Fri, Jun 24, 2016 at 1:07 PM, Michael Dyer <[email protected]>
wrote:

> I'm looking for assistance in how to configure a set of processors to so
> that I only retrieve 'new' files:
>
> - A GetSFTP processor that executes on a daily basis.
> - The GetSFTP processor has read-only access to the remote site
> - Large (Multi-GB) files are added to the remote site daily.
> - Naming of the files is unpredictable.
> - Files are rotated (removed) from the site after approximately 1 week
>
> Currently, I'm having to transfer ALL of the files on a daily basis and
> then I use PutHDFS processor which ignores (discards) any duplicates.
> Having to re-transfer files I already have is very inefficient, especially
> given the large file sizes.
>
> Does anyone know of a pattern to:
>
> 1) Retrieve a list of files
> 2) Compare each file against HDFS and
> 3) Retrieve any 'missing' files?
>
> I tried building this with ListSFTP, but then ran into a problem that
> GetSFTP does not allow me to the ListSFTP results as an input.
>
> Thanks for the help!
>
> Michael
>
>

Reply via email to