I have a working set larger than available memory, thus I am hoping to turn on rdd compression so that I can store more in-memory. Strangely it made no difference. The number of cached partitions, fraction cached, and size in memory remain the same. Any ideas?
I confirmed that rdd compression wasn't on before and it was on for the second test. scala> sc.getConf.getAll foreach println ... (spark.rdd.compress,true) ... I haven't tried lzo vs snappy, but my guess is that either one should provide at least some benefit.. Thanks. -Simon