This should help
https://cloud.google.com/hadoop/examples/bigquery-connector-spark-example

On 8 January 2017 at 03:49, neil90 <neilp1...@icloud.com> wrote:

> Here is how you would read from Google Cloud Storage(note you need to
> create
> a service account key) ->
>
> os.environ['PYSPARK_SUBMIT_ARGS'] = """--jars
> /home/neil/Downloads/gcs-connector-latest-hadoop2.jar pyspark-shell"""
>
> from pyspark import SparkContext, SparkConf
> from pyspark.sql import SparkSession, SQLContext
>
> conf = SparkConf()\
>     .setMaster("local[8]")\
>     .setAppName("GS")
>
> sc = SparkContext(conf=conf)
>
> sc._jsc.hadoopConfiguration().set("fs.gs.impl",
> "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
> sc._jsc.hadoopConfiguration().set("fs.AbstractFileSystem.gs.impl",
> "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
> sc._jsc.hadoopConfiguration().set("fs.gs.project.id", "PUT UR GOOGLE
> PROJECT
> ID HERE")
>
> sc._jsc.hadoopConfiguration().set("fs.gs.auth.service.account.email",
> "testa...@sparkgcs.iam.gserviceaccount.com")
> sc._jsc.hadoopConfiguration().set("fs.gs.auth.service.account.enable",
> "true")
> sc._jsc.hadoopConfiguration().set("fs.gs.auth.service.account.keyfile",
> "sparkgcs-96bd21691c29.p12")
>
> spark = SparkSession.builder\
>     .config(conf=sc.getConf())\
>     .getOrCreate()
>
> dfTermRaw = spark.read.format("csv")\
>     .option("header", "true")\
>     .option("delimiter" ,"\t")\
>     .option("inferSchema", "true")\
>     .load("gs://bucket_test/sample.tsv")
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-Read-from-Google-store-and-
> save-in-AWS-s3-tp28278p28286.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to