In GCP the equivalent of HDFS is Google Could Storage. You have to change the url from hdfs://<hdfs storage directory> to gs://<hdfs storage directory>.
The map reduce api's will work as it is with this change. You run map reduce jobs on Google Dataproc instance. Your storage is in Google Cloud Storage bucket. Refer GCP documents. On Friday, June 14, 2019, Amit Kabra <amitkabrai...@gmail.com> wrote: > Any help here ? > On Thu, Jun 13, 2019 at 12:38 PM Amit Kabra <amitkabrai...@gmail.com> wrote: >> >> Hello, >> I have a requirement where I need to read/write data to public cloud via map reduce job. >> Our systems currently read and write of data from hdfs using mapreduce and its working well, we write data in sequencefile format. >> We might have to move data to public cloud i.e s3 / gcp. Where everything remains same just we do read/write to s3/gcp >> I did quick search for gcp and I didn't get much info on doing mapreduce directly from it. GCS connector for hadoop looks closest but I didn't find any map reduce sample for the same. >> Any help on where to start for it or is it not even possible say s3/gcp outputformat not there ,etc and we need to do some hack. >> Thanks, >> Amit Kabra. >>