Re: Mapreduce to and from public clouds

Susheel Kumar Gadalay Fri, 14 Jun 2019 09:56:29 -0700

In GCP the equivalent of HDFS is Google Could Storage. You have to change
the url from hdfs://<hdfs storage directory> to gs://<hdfs storage
directory>.


The map reduce api's will work as it is with this change. You run map
reduce jobs on Google Dataproc instance. Your storage is in Google Cloud
Storage bucket. Refer GCP documents.

On Friday, June 14, 2019, Amit Kabra <amitkabrai...@gmail.com> wrote:
> Any help here ?
> On Thu, Jun 13, 2019 at 12:38 PM Amit Kabra <amitkabrai...@gmail.com>
wrote:
>>
>> Hello,
>> I have a requirement where I need to read/write data to public cloud via
map reduce job.
>> Our systems currently read and write of data from hdfs using mapreduce
and its working well, we write data in sequencefile format.
>> We might have to move data to public cloud i.e s3 / gcp. Where
everything remains same just we do read/write to s3/gcp
>> I did quick search for gcp and I didn't get much info on doing mapreduce
directly from it. GCS connector for hadoop looks closest but I didn't find
any map reduce sample for the same.
>> Any help on where to start for it or is it not even possible say s3/gcp
outputformat  not there ,etc and we need to do some hack.
>> Thanks,
>> Amit Kabra.
>>

Re: Mapreduce to and from public clouds

Reply via email to