Hi Gourav,
Today I try to reproduce your case, but failed. Can you post your full code please? If it is possible, give us the table schema, I can produce the data by schema. BTW,my spark is2.1.0. I am interesting this caae very much. ---Original--- From: "????"<zhoukang199...@gmail.com> Date: 2017/7/28 17:25:03 To: "Gourav Sengupta"<gourav.sengu...@gmail.com>; Cc: "user"<user@spark.apache.org>; Subject: Re: SPARK Storagelevel issues All right, i did not catch the point ,sorry for that.But you can take a snapshot of the heap, and then analysis heap dump by mat or other tools. From the code i can not find any clue. 2017-07-28 17:09 GMT+08:00 Gourav Sengupta <gourav.sengu...@gmail.com>: Hi, I have done all of that, but my question is "why should a 62 MB data give memory error when we have over 2 GB of memory available". Therefore all that is mentioned by Zhoukang is not pertinent at all. Regards, Gourav Sengupta On Fri, Jul 28, 2017 at 4:43 AM, ???? <zhoukang199...@gmail.com> wrote: testdf.persist(pyspark.storagelevel.StorageLevel.MEMORY_ONLY_SER) maybe StorageLevel should change.And check you config "spark.memory.storageFraction" which default value is 0.5 2017-07-28 3:04 GMT+08:00 Gourav Sengupta <gourav.sengu...@gmail.com>: Hi, I cached in a table in a large EMR cluster and it has a size of 62 MB. Therefore I know the size of the table while cached. But when I am trying to cache in the table in smaller cluster which still has a total of 3 GB Driver memory and two executors with close to 2.5 GB memory the job still keeps on failing giving JVM out of memory errors. Is there something that I am missing? CODE: ================================================================= sparkSession = spark.builder \ .config("spark.rdd.compress", "true") \ .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \ .config("spark.executor.extraJavaOptions","-XX:+UseCompressedOops -XX:+PrintGCDetails -XX:+PrintGCTimeStamps") \ .appName("test").enableHiveSupport().getOrCreate() testdf = sparkSession.sql("select * from tablename") testdf.persist(pyspark.storagelevel.StorageLevel.MEMORY_ONLY_SER) ================================================================= This causes JVM out of memory error. Regards, Gourav Sengupta