There is a lot of work done on SparkSQL and DataFrames which optimizes the execution, with some of it working on the data source – I.e., optimizing read from Parquet.
I was wondering if using SparkSQL with streaming (in transform/foreachRDD) could benefit in optimization ? Although (currently) it looks like the streaming data source optimization is not supported, will I gain any optimization by using SparkSQL on the stream’s RDDs ? Thanks, Amit