There is a lot of work done on SparkSQL and DataFrames which optimizes the 
execution, with some of it working on the data source – I.e., optimizing read 
from Parquet.

I was wondering if using SparkSQL with streaming (in transform/foreachRDD) 
could benefit in optimization ? Although (currently) it looks like the 
streaming data source optimization is not supported, will I gain any 
optimization by using SparkSQL on the stream’s RDDs ?


Thanks,
Amit

Reply via email to