Re: Trying to run SparkSQL over Spark Streaming

praveshjain1991 Tue, 26 Aug 2014 01:12:13 -0700

Thanks for the reply.

Ya it doesn't seem doable straight away. Someone suggested this

/For each of your streams, first create an emty RDD that you register as a
table, obtaining an empty table. For your example, let's say you call it
"allTeenagers".

Then, for each of your queries, use SchemaRDD's insertInto method to add the
result to that table:

teenagers.insertInto("allTeenagers")

If you do this with both your streams, creating two separate accumulation
tables, you can then join them using a plain old SQL query.
/

So I was trying it but can't seem to use the insertInto method in the
correct way. Something like:

    var p1 = Person("Hari",22)
    val rdd1 = sc.parallelize(Array(p1))    
    rdd1.registerAsTable("data")

    var p2 = Person("sagar", 22)
    var rdd2 = sc.parallelize(Array(p2))
    rdd2.insertInto("data")

is giving the error : "java.lang.AssertionError: assertion failed: No plan
for InsertIntoTable Map(), false"

Any thoughts?

Thanks

Hi again,

On Tue, Aug 26, 2014 at 10:13 AM, Tobias Pfeiffer &lt;tgp@&gt; wrote:
>
> On Mon, Aug 25, 2014 at 7:11 PM, praveshjain1991 <
> praveshjain1991@> wrote:
>>
>>  "If you want to issue an SQL statement on streaming data, you must have
>> both
>> the registerAsTable() and the sql() call *within* the foreachRDD(...)
>> block,
>> or -- as you experienced -- the table name will be unknown"
>>
>> Since this is the case then is there any way to run join over data
>> received
>> from two different streams?
>>
>
> Couldn't you do dstream1.join(dstream2).foreachRDD(...)?
>

 Ah, I guess you meant something like "SELECT * FROM dstream1 JOIN dstream2
WHERE ..."? I don't know if that is possible. Doesn't seem easy to me; I
don't think that's doable with the current codebase...

Tobias

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Trying-to-run-SparkSQL-over-Spark-Streaming-tp12530p12812.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Trying to run SparkSQL over Spark Streaming

Reply via email to