Hi all,
I am using Spark Streaming to monitor an S3 bucket for objects that contain
JSON. I want
to import that JSON into Spark SQL DataFrame.
Here's my current code:
*from pyspark import SparkContext, SparkConf*
*from pyspark.streaming import StreamingContext*
*import json*
*from pyspark.sql import SQLContext*
*conf = SparkConf().setAppName('MyApp').setMaster('local[4]')*
*sc = SparkContext(conf=conf)*
*ssc = StreamingContext(sc, 30)*
*sqlContext = SQLContext(sc)*
*distFile = ssc.textFileStream("s3n://mybucket/")*
*json_data = sqlContext.jsonRDD(distFile)*
*json_data.printSchema()*
*ssc.start()*
*ssc.awaitTermination()*
I am not creating DataFrame correctly as I get an error:
*'TransformedDStream' object has no attribute '_jrdd'*
Can someone help me out?
Thanks,
Vadim
ᐧ