Yes, a previous prototype is available https://github.com/Intel-bigdata/spark-streamsql, and a talk is given at last year's Spark Summit ( http://spark-summit.org/2014/talk/streamsql-on-spark-manipulating-streams-by-sql-using-spark )
We are currently porting the prototype to use the latest DataFrame API, and will provide a stable version for people to try soon. Thabnks, -Jason On Wed, Mar 11, 2015 at 9:12 AM, Tobias Pfeiffer <t...@preferred.jp> wrote: > Hi, > > On Wed, Mar 11, 2015 at 9:33 AM, Cheng, Hao <hao.ch...@intel.com> wrote: > >> Intel has a prototype for doing this, SaiSai and Jason are the authors. >> Probably you can ask them for some materials. >> > > The github repository is here: https://github.com/intel-spark/stream-sql > > Also, what I did is writing a wrapper class SchemaDStream that internally > holds a DStream[Row] and a DStream[StructType] (the latter having just one > element in every RDD) and then allows to do > - operations SchemaRDD => SchemaRDD using > `rowStream.transformWith(schemaStream, ...)` > - in particular you can register this stream's data as a table this way > - and via a companion object with a method `fromSQL(sql: String): > SchemaDStream` you can get a new stream from previously registered tables. > > However, you are limited to batch-internal operations, i.e., you can't > aggregate across batches. > > I am not able to share the code at the moment, but will within the next > months. It is not very advanced code, though, and should be easy to > replicate. Also, I have no idea about the performance of transformWith.... > > Tobias > >