Hola! -Dspark.serializer=org.apache.spark.serializer.KryoSerializer -Dspark.executor.memory=100g Now, when I load my dataset, transform it with some one to one transformations, and try to cache the eventual RDD - it runs really slow and then runs out of memory. When I remove Kyro serializer and default back to java serialization it works just fine and is able to load and cache the 700Gs of resultant data. (Btw, I am not registering my classes with Kyro yet but I do'nt think it should be worst than Java Serialization - should it?) Here's a summary of all the experiments I ran : ![]() Any explanation for this behavior? Also, I saw that even in the cases when caching was successful, the Size In Memory would go up to a certain level and then fall down, and then climb back up. Why does that happen? Regards, Vipul |

