Hi, On Wed, Mar 11, 2015 at 9:33 AM, Cheng, Hao <hao.ch...@intel.com> wrote:
> Intel has a prototype for doing this, SaiSai and Jason are the authors. > Probably you can ask them for some materials. > The github repository is here: https://github.com/intel-spark/stream-sql Also, what I did is writing a wrapper class SchemaDStream that internally holds a DStream[Row] and a DStream[StructType] (the latter having just one element in every RDD) and then allows to do - operations SchemaRDD => SchemaRDD using `rowStream.transformWith(schemaStream, ...)` - in particular you can register this stream's data as a table this way - and via a companion object with a method `fromSQL(sql: String): SchemaDStream` you can get a new stream from previously registered tables. However, you are limited to batch-internal operations, i.e., you can't aggregate across batches. I am not able to share the code at the moment, but will within the next months. It is not very advanced code, though, and should be easy to replicate. Also, I have no idea about the performance of transformWith.... Tobias