Mark, Thanks for the pointer on SQS.
I am thinking that it would help in having a higher level processor for distcp to cover both HDFS and S3 as source/sink. Naga Vijayapuram On Wed, Dec 2, 2015 at 9:48 AM, Mark Payne <[email protected]> wrote: > We certainly can do the reverse case - sync S3 with HDFS. With S3, as Joe > S mentioned, we really should have a ListS3 > but currently do not (We do have a ListHDFS though). Typically the use > case that I've used with S3 is to setup S3 to notify > when an object arrives via SQS. Then have GetSQS get that notification and > then pull the data via FetchS3Object. > So you could fairly easily setup a GetSQS -> EvaluateJSONPath -> > FetchS3Object -> PutHDFS. That would require that SQS be setup though to > notify you when new objects arrive. > > On Dec 2, 2015, at 12:24 PM, Naga Vijay <[email protected]> wrote: > > Joe Witt & Joe Skora, > > Thanks for thinking about this. Yes, it would serve as a great > example/template (as would the reverse case). > > Naga Vijayapuram > > > On Tue, Dec 1, 2015 at 11:05 PM, Joe Skora <[email protected]> wrote: > >> @JoeW, >> >> It looks like we need to add a ListS3 processor in addition to the >> Multipart Upload management that I'm looking into now. Extending >> ListFileTransfer for S3 shouldn't be too bad. >> >> JoeS >> >> On Wed, Dec 2, 2015 at 12:04 AM, Joe Witt <[email protected]> wrote: >> >>> Hello >>> >>> So we have FetchS3 and PutHDFS and a series of interesting in between >>> processes to help. So that would get you most of the way there. How >>> to get the listing/know what to pull from S3? That part I'm not sure >>> about. >>> >>> This would make for a great example/template for us to post (as would >>> the reverse case). >>> >>> Thanks >>> Joe >>> >>> On Tue, Dec 1, 2015 at 10:36 PM, Naga Vijay <[email protected]> wrote: >>> > Hello, >>> > >>> > Is there a processor to DistCp from Amazon S3 to HDFS, or do I need to >>> write >>> > a processor for it? >>> > >>> > Thanks >>> > Naga >>> >> >> > >
