Thanks Shixiong! Your response helped me to understand the role of persist(). No persist() calls were required indeed. I solved my problem by setting spark.local.dir to allow more space for Spark temporary folder. It works automatically. I am seeing logs like this:
Not enough space to cache rdd_0_1 in memory! Persisting partition rdd_0_1 to disk instead. Before I was getting: No space left on device On 9 July 2015 at 11:57, Shixiong Zhu <zsxw...@gmail.com> wrote: > Spark won't store RDDs to memory unless you use a memory StorageLevel. By > default, your input and intermediate results won't be put into memory. You > can call persist if you want to avoid duplicate computation or reading. > E.g., > > val r1 = context.wholeTextFiles(...) > val r2 = r1.flatMap(s -> ...) > val r3 = r2.filter(...)... > r3.saveAsTextFile(...) > val r4 = r2.map(...)... > r4.saveAsTextFile(...) > > In the avoid example, r2 will be used twice. To speed up the computation, > you can call r2.persist(StorageLevel.MEMORY) to store r2 into memory. Then > r4 will use the data of r2 in memory directly. E.g., > > val r1 = context.wholeTextFiles(...) > val r2 = r1.flatMap(s -> ...) > r2.persist(StorageLevel.MEMORY) > val r3 = r2.filter(...)... > r3.saveAsTextFile(...) > val r4 = r2.map(...)... > r4.saveAsTextFile(...) > > See > http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence > > > Best Regards, > Shixiong Zhu > > 2015-07-09 22:09 GMT+08:00 Michal Čizmazia <mici...@gmail.com>: > >> Is there a way how to change the default storage level? >> >> If not, how can I properly change the storage level wherever necessary, >> if my input and intermediate results do not fit into memory? >> >> In this example: >> >> context.wholeTextFiles(...) >> .flatMap(s -> ...) >> .flatMap(s -> ...) >> >> Does persist() need to be called after every transformation? >> >> context.wholeTextFiles(...) >> .persist(StorageLevel.MEMORY_AND_DISK) >> .flatMap(s -> ...) >> .persist(StorageLevel.MEMORY_AND_DISK) >> .flatMap(s -> ...) >> .persist(StorageLevel.MEMORY_AND_DISK) >> >> Thanks! >> >> >