Re: SQL + streaming

hsy...@gmail.com Tue, 15 Jul 2014 19:53:30 -0700

Thanks Tathagata, we actually found the problem. I created SQLContext and
StreamContext from different SparkContext.  But thanks for your help


Best,
Siyuan


On Tue, Jul 15, 2014 at 6:53 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> Oh yes, we have run sql, streaming and mllib all together.
>
> You can take a look at the demo <https://databricks.com/cloud> that
> DataBricks gave at the spark summit.
>
> I think I get the problem is. Sql("....") returns a RDD, and println(rdd)
> prints only the RDD's name. And rdd.foreach(println) prints the records in
> the executors, so you wont find anything in the driver logs!
> So try doing a collect, or take on the RDD returned by sql query and print
> that.
>
> TD
>
>
> On Tue, Jul 15, 2014 at 4:28 PM, hsy...@gmail.com <hsy...@gmail.com>
> wrote:
>
>> By the way, have you ever run SQL and stream together? Do you know any
>> example that works? Thanks!
>>
>>
>> On Tue, Jul 15, 2014 at 4:28 PM, hsy...@gmail.com <hsy...@gmail.com>
>> wrote:
>>
>>> Hi Tathagata,
>>>
>>> I could see the output of count, but no sql results. Run in standalone
>>> is meaningless for me and I just run in my local single node yarn cluster.
>>> Thanks
>>>
>>>
>>> On Tue, Jul 15, 2014 at 12:48 PM, Tathagata Das <
>>> tathagata.das1...@gmail.com> wrote:
>>>
>>>> Could you run it locally first to make sure it works, and you see
>>>> output? Also, I recommend going through the previous step-by-step approach
>>>> to narrow down where the problem is.
>>>>
>>>> TD
>>>>
>>>>
>>>> On Mon, Jul 14, 2014 at 9:15 PM, hsy...@gmail.com <hsy...@gmail.com>
>>>> wrote:
>>>>
>>>>> Actually, I deployed this on yarn cluster(spark-submit) and I couldn't
>>>>> find any output from the yarn stdout logs
>>>>>
>>>>>
>>>>> On Mon, Jul 14, 2014 at 6:25 PM, Tathagata Das <
>>>>> tathagata.das1...@gmail.com> wrote:
>>>>>
>>>>>> Can you make sure you are running locally on more than 1 local cores?
>>>>>> You could set the master in the SparkConf as conf.setMaster("local[4]").
>>>>>> Then see if there are jobs running on every batch of data in the Spark 
>>>>>> web
>>>>>> ui (running on localhost:4040). If you still dont get any output, try 
>>>>>> first
>>>>>> simple printing recRDD.count() in the foreachRDD (that is, first test 
>>>>>> spark
>>>>>> streaming). If you can get that to work, then I would test the Spark SQL
>>>>>> stuff.
>>>>>>
>>>>>> TD
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 14, 2014 at 5:25 PM, hsy...@gmail.com <hsy...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> No errors but no output either... Thanks!
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 14, 2014 at 4:59 PM, Tathagata Das <
>>>>>>> tathagata.das1...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Could you elaborate on what is the problem you are facing? Compiler
>>>>>>>> error? Runtime error? Class-not-found error? Not receiving any data 
>>>>>>>> from
>>>>>>>> Kafka? Receiving data but SQL command throwing error? No errors but no
>>>>>>>> output either?
>>>>>>>>
>>>>>>>> TD
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jul 14, 2014 at 4:06 PM, hsy...@gmail.com <hsy...@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> Couple days ago, I tried to integrate SQL and streaming together.
>>>>>>>>> My understanding is I can transform RDD from Dstream to schemaRDD and
>>>>>>>>> execute SQL on each RDD. But I got no luck
>>>>>>>>> Would you guys help me take a look at my code?  Thank you very
>>>>>>>>> much!
>>>>>>>>>
>>>>>>>>> object KafkaSpark {
>>>>>>>>>
>>>>>>>>>   def main(args: Array[String]): Unit = {
>>>>>>>>>     if (args.length < 4) {
>>>>>>>>>       System.err.println("Usage: KafkaSpark <zkQuorum> <group>
>>>>>>>>> <topics> <numThreads>")
>>>>>>>>>       System.exit(1)
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     val Array(zkQuorum, group, topics, numThreads) = args
>>>>>>>>>     val sparkConf = new SparkConf().setAppName("KafkaSpark")
>>>>>>>>>     val ssc =  new StreamingContext(sparkConf, Seconds(10))
>>>>>>>>>     val sc = new SparkContext(sparkConf)
>>>>>>>>>     val sqlContext = new SQLContext(sc);
>>>>>>>>> //    ssc.checkpoint("checkpoint")
>>>>>>>>>
>>>>>>>>>     // Importing the SQL context gives access to all the SQL
>>>>>>>>> functions and implicit conversions.
>>>>>>>>>     import sqlContext._
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     val tt = Time(10000)
>>>>>>>>>     val topicpMap =
>>>>>>>>> topics.split(",").map((_,numThreads.toInt)).toMap
>>>>>>>>>     val recordsStream = KafkaUtils.createStream(ssc, zkQuorum,
>>>>>>>>> group, topicpMap).map(t => getRecord(t._2.split("#")))
>>>>>>>>>
>>>>>>>>>     val result = recordsStream.foreachRDD((recRDD, tt)=>{
>>>>>>>>>       recRDD.registerAsTable("records")
>>>>>>>>>       val result = sql("select * from records")
>>>>>>>>>       println(result)
>>>>>>>>>       result.foreach(println)
>>>>>>>>>     })
>>>>>>>>>
>>>>>>>>>     ssc.start()
>>>>>>>>>     ssc.awaitTermination()
>>>>>>>>>
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>>   def getRecord(l:Array[String]):Record = {
>>>>>>>>>     println("Getting the record")
>>>>>>>>>     Record(l(0), l(1))}
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: SQL + streaming

Reply via email to