You might consider instead storing the data using saveAsParquetFile and then querying that after running sqlContext.parquetFile(...).registerTempTable(...).
On Sun, Sep 28, 2014 at 6:43 PM, Michael Armbrust <mich...@databricks.com> wrote: > This is not possible until https://github.com/apache/spark/pull/2501 is > merged. > > On Sun, Sep 28, 2014 at 6:39 PM, Haopu Wang <hw...@qilinsoft.com> wrote: > >> Thanks for the response. From Spark Web-UI's Storage tab, I do see >> cached RDD there. >> >> >> >> But the storage level is "Memory Deserialized 1x Replicated". How can I >> change the storage level? Because I have a big table there. >> >> >> >> Thanks! >> >> >> ------------------------------ >> >> *From:* Cheng Lian [mailto:lian.cs....@gmail.com] >> *Sent:* 2014年9月26日 21:24 >> *To:* Haopu Wang; user@spark.apache.org >> *Subject:* Re: Spark SQL question: is cached SchemaRDD storage >> controlled by "spark.storage.memoryFraction"? >> >> >> >> Yes it is. The in-memory storage used with SchemaRDD also uses >> RDD.cache() under the hood. >> >> On 9/26/14 4:04 PM, Haopu Wang wrote: >> >> Hi, I'm querying a big table using Spark SQL. I see very long GC time in >> >> some stages. I wonder if I can improve it by tuning the storage >> >> parameter. >> >> >> >> The question is: the schemaRDD has been cached with "cacheTable()" >> >> function. So is the cached schemaRDD part of memory storage controlled >> >> by the "spark.storage.memoryFraction" parameter? >> >> >> >> Thanks! >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> >> >> > >