Thanks Shixiong! Your response helped me to understand the role of
persist(). No persist() calls were required indeed. I solved my problem by
setting spark.local.dir to allow more space for Spark temporary folder. It
works automatically. I am seeing logs like this:

    Not enough space to cache rdd_0_1 in memory!
    Persisting partition rdd_0_1 to disk instead.

Before I was getting:

    No space left on device


On 9 July 2015 at 11:57, Shixiong Zhu <zsxw...@gmail.com> wrote:

> Spark won't store RDDs to memory unless you use a memory StorageLevel. By
> default, your input and intermediate results won't be put into memory. You
> can call persist if you want to avoid duplicate computation or reading.
> E.g.,
>
> val r1 = context.wholeTextFiles(...)
> val r2 = r1.flatMap(s -> ...)
> val r3 = r2.filter(...)...
> r3.saveAsTextFile(...)
> val r4 = r2.map(...)...
> r4.saveAsTextFile(...)
>
> In the avoid example, r2 will be used twice. To speed up the computation,
> you can call r2.persist(StorageLevel.MEMORY) to store r2 into memory. Then
> r4 will use the data of r2 in memory directly. E.g.,
>
> val r1 = context.wholeTextFiles(...)
> val r2 = r1.flatMap(s -> ...)
> r2.persist(StorageLevel.MEMORY)
> val r3 = r2.filter(...)...
> r3.saveAsTextFile(...)
> val r4 = r2.map(...)...
> r4.saveAsTextFile(...)
>
> See
> http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence
>
>
> Best Regards,
> Shixiong Zhu
>
> 2015-07-09 22:09 GMT+08:00 Michal Čizmazia <mici...@gmail.com>:
>
>> Is there a way how to change the default storage level?
>>
>> If not, how can I properly change the storage level wherever necessary,
>> if my input and intermediate results do not fit into memory?
>>
>> In this example:
>>
>> context.wholeTextFiles(...)
>>     .flatMap(s -> ...)
>>     .flatMap(s -> ...)
>>
>> Does persist() need to be called after every transformation?
>>
>>  context.wholeTextFiles(...)
>>     .persist(StorageLevel.MEMORY_AND_DISK)
>>     .flatMap(s -> ...)
>>     .persist(StorageLevel.MEMORY_AND_DISK)
>>     .flatMap(s -> ...)
>>     .persist(StorageLevel.MEMORY_AND_DISK)
>>
>>  Thanks!
>>
>>
>

Reply via email to