question about sparksql caching

sequoiadb Thu, 14 May 2015 20:05:41 -0700

Hi all,

We are planing to use SparkSQL in a DW system. There’s a question about the 
caching mechanism of SparkSQL.


For example, if I have a SQL like sqlContext.sql(“select c1, sum(c2) from T1, 
T2 where T1.key=T2.key group by c1”).cache()

Is it going to cache the final result or the raw data of each table that used 
in the SQL?

Since the user may have various of SQLs that use those tables, if the caching 
is for the final result only, it may still take very long time to scan the 
entire table if it’s a brand new SQL.

If this is the case, is there any other better way to cache the base tables 
instead of final result?

Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

question about sparksql caching

Reply via email to