On 5 Jan 2017, at 09:58, Manohar753 <manohar.re...@happiestminds.com<mailto:manohar.re...@happiestminds.com>> wrote:
Hi All, Using spark is interoperability communication between two clouds(Google,AWS) possible. in my use case i need to take Google store as input to spark and do some processing and finally needs to store in S3 and my spark engine runs on AWS Cluster. Please let me back is there any way for this kind of usecase bu using directly spark without any middle components and share the info or link if you have. Thanks, I've not played with GCS, and have some noted concerns about test coverage ( https://github.com/GoogleCloudPlatform/bigdata-interop/pull/40 ) , but assuming you are not hitting any specific problems, it should be a matter of having the input as gs://bucket/path and dest s3a://bucket-on-s3/path2 You'll need the google storage JARs on your classpath, along with those needed for S3n/s3a. 1. little talk on the topic, though I only play with azure and s3 https://www.youtube.com/watch?v=ND4L_zSDqF0 2. some notes; bear in mind that the s3a performance tuning covered relates to things surfacing in Hadoop 2.8, which you probably wont have. https://hortonworks.github.io/hdp-aws/s3-spark/ A one line test for s3 installed is can you read the landsat CSV file sparkContext.textFile("s3a://landsat-pds/scene_list.gz").count() this should work from wherever you are if your classpath and credentials are set up