Re: SQL with Spark Streaming

Tobias Pfeiffer Tue, 10 Mar 2015 18:14:43 -0700

Hi,

On Wed, Mar 11, 2015 at 9:33 AM, Cheng, Hao <hao.ch...@intel.com> wrote:


>  Intel has a prototype for doing this, SaiSai and Jason are the authors.
> Probably you can ask them for some materials.
>

The github repository is here: https://github.com/intel-spark/stream-sql

Also, what I did is writing a wrapper class SchemaDStream that internally
holds a DStream[Row] and a DStream[StructType] (the latter having just one
element in every RDD) and then allows to do
- operations SchemaRDD => SchemaRDD using
`rowStream.transformWith(schemaStream, ...)`
- in particular you can register this stream's data as a table this way
- and via a companion object with a method `fromSQL(sql: String):
SchemaDStream` you can get a new stream from previously registered tables.

However, you are limited to batch-internal operations, i.e., you can't
aggregate across batches.

I am not able to share the code at the moment, but will within the next
months. It is not very advanced code, though, and should be easy to
replicate. Also, I have no idea about the performance of transformWith....

Tobias

Re: SQL with Spark Streaming

Reply via email to