Re: Spark SQL question: how to control the storage level of cached SchemaRDD?

Michael Armbrust Sun, 28 Sep 2014 18:45:07 -0700

You might consider instead storing the data using saveAsParquetFile and
then querying that after running
sqlContext.parquetFile(...).registerTempTable(...).


On Sun, Sep 28, 2014 at 6:43 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> This is not possible until https://github.com/apache/spark/pull/2501 is
> merged.
>
> On Sun, Sep 28, 2014 at 6:39 PM, Haopu Wang <hw...@qilinsoft.com> wrote:
>
>>   Thanks for the response. From Spark Web-UI's Storage tab, I do see
>> cached RDD there.
>>
>>
>>
>> But the storage level is "Memory Deserialized 1x Replicated". How can I
>> change the storage level? Because I have a big table there.
>>
>>
>>
>> Thanks!
>>
>>
>>  ------------------------------
>>
>> *From:* Cheng Lian [mailto:lian.cs....@gmail.com]
>> *Sent:* 2014年9月26日 21:24
>> *To:* Haopu Wang; user@spark.apache.org
>> *Subject:* Re: Spark SQL question: is cached SchemaRDD storage
>> controlled by "spark.storage.memoryFraction"?
>>
>>
>>
>> Yes it is. The in-memory storage used with SchemaRDD also uses
>> RDD.cache() under the hood.
>>
>> On 9/26/14 4:04 PM, Haopu Wang wrote:
>>
>> Hi, I'm querying a big table using Spark SQL. I see very long GC time in
>>
>> some stages. I wonder if I can improve it by tuning the storage
>>
>> parameter.
>>
>>
>>
>> The question is: the schemaRDD has been cached with "cacheTable()"
>>
>> function. So is the cached schemaRDD part of memory storage controlled
>>
>> by the "spark.storage.memoryFraction" parameter?
>>
>>
>>
>> Thanks!
>>
>>
>>
>> ---------------------------------------------------------------------
>>
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>>
>>  
>>
>
>

Re: Spark SQL question: how to control the storage level of cached SchemaRDD?

Reply via email to