We certainly can do the reverse case - sync S3 with HDFS. With S3, as Joe S 
mentioned, we really should have a ListS3
but currently do not (We do have a ListHDFS though). Typically the use case 
that I've used with S3 is to setup S3 to notify
when an object arrives via SQS. Then have GetSQS get that notification and then 
pull the data via FetchS3Object.
So you could fairly easily setup a GetSQS -> EvaluateJSONPath -> FetchS3Object 
-> PutHDFS. That would require that SQS be setup though to
notify you when new objects arrive.

> On Dec 2, 2015, at 12:24 PM, Naga Vijay <[email protected]> wrote:
> 
> Joe Witt & Joe Skora,
> 
> Thanks for thinking about this.  Yes, it would serve as a great 
> example/template (as would the reverse case).
> 
> Naga Vijayapuram
> 
> 
> On Tue, Dec 1, 2015 at 11:05 PM, Joe Skora <[email protected] 
> <mailto:[email protected]>> wrote:
> @JoeW,
> 
> It looks like we need to add a ListS3 processor in addition to the Multipart 
> Upload management that I'm looking into now.  Extending ListFileTransfer for 
> S3 shouldn't be too bad.
> 
> JoeS
> 
> On Wed, Dec 2, 2015 at 12:04 AM, Joe Witt <[email protected] 
> <mailto:[email protected]>> wrote:
> Hello
> 
> So we have FetchS3 and PutHDFS and a series of interesting in between
> processes to help.  So that would get you most of the way there.  How
> to get the listing/know what to pull from S3?  That part I'm not sure
> about.
> 
> This would make for a great example/template for us to post (as would
> the reverse case).
> 
> Thanks
> Joe
> 
> On Tue, Dec 1, 2015 at 10:36 PM, Naga Vijay <[email protected] 
> <mailto:[email protected]>> wrote:
> > Hello,
> >
> > Is there a processor to DistCp from Amazon S3 to HDFS, or do I need to write
> > a processor for it?
> >
> > Thanks
> > Naga
> 
> 

Reply via email to