I have a working set larger than available memory, thus I am hoping to turn
on rdd compression so that I can store more in-memory. Strangely it made no
difference. The number of cached partitions, fraction cached, and size in
memory remain the same. Any ideas?

I confirmed that rdd compression wasn't on before and it was on for the
second test.

scala> sc.getConf.getAll foreach println
...
(spark.rdd.compress,true)
...

I haven't tried lzo vs snappy, but my guess is that either one should
provide at least some benefit..

Thanks.
-Simon

Reply via email to