Hi Gourav,

Today I try to reproduce your case, but failed.
Can you post your full code please?
If it is possible, give us the table schema, I can produce the data by schema.
BTW,my spark is2.1.0.


I am interesting this caae very much.


 
---Original---
From: "????"<zhoukang199...@gmail.com>
Date: 2017/7/28 17:25:03
To: "Gourav Sengupta"<gourav.sengu...@gmail.com>;
Cc: "user"<user@spark.apache.org>;
Subject: Re: SPARK Storagelevel issues


All right, i did not catch the point ,sorry for that.But you can take a 
snapshot of the heap, and then analysis heap dump by mat or other tools.
From the code i can not find any clue.


2017-07-28 17:09 GMT+08:00 Gourav Sengupta <gourav.sengu...@gmail.com>:
Hi,

I have done all of that, but my question is "why should a 62 MB data give 
memory error when we have over 2 GB of memory available".


Therefore all that is mentioned by Zhoukang is not pertinent at all.




Regards,
Gourav Sengupta


On Fri, Jul 28, 2017 at 4:43 AM, ???? <zhoukang199...@gmail.com> wrote:
testdf.persist(pyspark.storagelevel.StorageLevel.MEMORY_ONLY_SER) maybe 
StorageLevel should change.And check you config "spark.memory.storageFraction" 
which default value is 0.5


2017-07-28 3:04 GMT+08:00 Gourav Sengupta <gourav.sengu...@gmail.com>:
Hi,

I cached in a table in a large EMR cluster and it has a size of 62 MB. 
Therefore I know the size of the table while cached.


But when I am trying to cache in the table in smaller cluster which still has a 
total of 3 GB Driver memory and two executors with close to 2.5 GB memory the 
job still keeps on failing giving JVM out of memory errors. 


Is there something that I am missing?


CODE:
=================================================================
sparkSession =  spark.builder \
                .config("spark.rdd.compress", "true") \
                .config("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer") \
                
.config("spark.executor.extraJavaOptions","-XX:+UseCompressedOops 
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps") \
                .appName("test").enableHiveSupport().getOrCreate()



testdf = sparkSession.sql("select * from tablename")
testdf.persist(pyspark.storagelevel.StorageLevel.MEMORY_ONLY_SER)

=================================================================



This causes JVM out of memory error.




Regards,
Gourav Sengupta

Reply via email to