Hi Folks,

I’d like to find out tips on how to convert the RDDs inside a Spark Streaming 
DStream to a set of SchemaRDDs.

My DStream contains JSON data pushed over from Kafka, and I’d like to use 
SparkSQL’s JSON import function (i.e. jsonRDD) to register the JSON dataset as 
a table, and perform queries on it.

Here’s a code snippet of my latest attempt (in Scala):
…
val sc = new SparkContext(conf)
val ssc = new StreamingContext("local", this.getClass.getName, Seconds(1))
ssc.checkpoint("checkpoint")

val stream = KafkaUtils.createStream(ssc, "localhost:2181", “group", 
Map(“topic" -> 10)).map(_._2)
val sql = new SQLContext(sc)

stream.foreachRDD(rdd => {
        if (rdd.count > 0) {
                // message received
                val sqlRDD = sql.jsonRDD(rdd)
                sqlRDD.printSchema()
        } else {
                println(“No message received")
        }
})
…

This compiles and runs when I submit it to Spark (local-mode); however, I never 
seem to be able to successfully see a schema printed on my console, via the 
“sqlRDD.printSchema()” method when Kafka is streaming my JSON messages to the 
“topic” topic name. I know my JSON is valid and my Kafka connection works fine, 
I’ve been able to print the stream messages in their raw format, just not as 
SchemaRDDs.  

Any tips? Suggestions?

Thanks much,
---
Rishi Verma
NASA Jet Propulsion Laboratory
California Institute of Technology





---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to