Mark,

Thanks for the pointer on SQS.

I am thinking that it would help in having a higher level processor for
distcp to cover both HDFS and S3 as source/sink.

Naga Vijayapuram


On Wed, Dec 2, 2015 at 9:48 AM, Mark Payne <[email protected]> wrote:

> We certainly can do the reverse case - sync S3 with HDFS. With S3, as Joe
> S mentioned, we really should have a ListS3
> but currently do not (We do have a ListHDFS though). Typically the use
> case that I've used with S3 is to setup S3 to notify
> when an object arrives via SQS. Then have GetSQS get that notification and
> then pull the data via FetchS3Object.
> So you could fairly easily setup a GetSQS -> EvaluateJSONPath ->
> FetchS3Object -> PutHDFS. That would require that SQS be setup though to
> notify you when new objects arrive.
>
> On Dec 2, 2015, at 12:24 PM, Naga Vijay <[email protected]> wrote:
>
> Joe Witt & Joe Skora,
>
> Thanks for thinking about this.  Yes, it would serve as a great
> example/template (as would the reverse case).
>
> Naga Vijayapuram
>
>
> On Tue, Dec 1, 2015 at 11:05 PM, Joe Skora <[email protected]> wrote:
>
>> @JoeW,
>>
>> It looks like we need to add a ListS3 processor in addition to the
>> Multipart Upload management that I'm looking into now.  Extending
>> ListFileTransfer for S3 shouldn't be too bad.
>>
>> JoeS
>>
>> On Wed, Dec 2, 2015 at 12:04 AM, Joe Witt <[email protected]> wrote:
>>
>>> Hello
>>>
>>> So we have FetchS3 and PutHDFS and a series of interesting in between
>>> processes to help.  So that would get you most of the way there.  How
>>> to get the listing/know what to pull from S3?  That part I'm not sure
>>> about.
>>>
>>> This would make for a great example/template for us to post (as would
>>> the reverse case).
>>>
>>> Thanks
>>> Joe
>>>
>>> On Tue, Dec 1, 2015 at 10:36 PM, Naga Vijay <[email protected]> wrote:
>>> > Hello,
>>> >
>>> > Is there a processor to DistCp from Amazon S3 to HDFS, or do I need to
>>> write
>>> > a processor for it?
>>> >
>>> > Thanks
>>> > Naga
>>>
>>
>>
>
>

Reply via email to