Re: SQL with Spark Streaming

Jason Dai Wed, 11 Mar 2015 06:42:07 -0700

Yes, a previous prototype is available
https://github.com/Intel-bigdata/spark-streamsql, and a talk is given at
last year's Spark Summit (
http://spark-summit.org/2014/talk/streamsql-on-spark-manipulating-streams-by-sql-using-spark
)


We are currently porting the prototype to use the latest DataFrame API, and
will provide a stable version for people to try soon.

Thabnks,
-Jason


On Wed, Mar 11, 2015 at 9:12 AM, Tobias Pfeiffer <t...@preferred.jp> wrote:

> Hi,
>
> On Wed, Mar 11, 2015 at 9:33 AM, Cheng, Hao <hao.ch...@intel.com> wrote:
>
>>  Intel has a prototype for doing this, SaiSai and Jason are the authors.
>> Probably you can ask them for some materials.
>>
>
> The github repository is here: https://github.com/intel-spark/stream-sql
>
> Also, what I did is writing a wrapper class SchemaDStream that internally
> holds a DStream[Row] and a DStream[StructType] (the latter having just one
> element in every RDD) and then allows to do
> - operations SchemaRDD => SchemaRDD using
> `rowStream.transformWith(schemaStream, ...)`
> - in particular you can register this stream's data as a table this way
> - and via a companion object with a method `fromSQL(sql: String):
> SchemaDStream` you can get a new stream from previously registered tables.
>
> However, you are limited to batch-internal operations, i.e., you can't
> aggregate across batches.
>
> I am not able to share the code at the moment, but will within the next
> months. It is not very advanced code, though, and should be easy to
> replicate. Also, I have no idea about the performance of transformWith....
>
> Tobias
>
>

Re: SQL with Spark Streaming

Reply via email to