[jira] [Comment Edited] (SPARK-31281) Hit OOM Error - GC Limit

2020-03-29 Thread Alfred Davidson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070478#comment-17070478
 ] 

Alfred Davidson edited comment on SPARK-31281 at 3/29/20, 6:54 PM:
---

The allocated driver memory will be split for storage, memoryOverhead etc. Your 
transformation is doing a join (which is likely to be a broadcast join) and you 
have an action that is bringing the data to the driver - the driver doesn’t 
have enough memory (and initially trying to GC to free up space). You can 
either allocate more driver memory or change the fraction that it allocations 
for storage. I believe default value is 0.6 e.g reserves 60% of driver memory 
for storage


was (Author: alfiewdavidson):
The allocated driver memory will be split for storage, memoryOverhead etc. As 
your action is bringing the data to the driver - the driver doesn’t have enough 
memory (and initially trying to GC to free up space). You can either allocate 
more driver memory or change the fraction that it allocations for storage. I 
believe default value is 0.6 e.g reserves 60% of driver memory for storage

> Hit OOM Error - GC Limit
> 
>
> Key: SPARK-31281
> URL: https://issues.apache.org/jira/browse/SPARK-31281
> Project: Spark
>  Issue Type: Question
>  Components: Java API
>Affects Versions: 2.4.4
>Reporter: HongJin
>Priority: Critical
>
> MemoryStore is 2.6GB
> conf = new SparkConf().setAppName("test")
>  //.set("spark.sql.codegen.wholeStage", "false")
>  .set("spark.driver.host", "localhost")
>  .set("spark.driver.memory", "4g")
>  .set("spark.executor.cores","1")
>  .set("spark.num.executors","1")
>  .set("spark.executor.memory", "4g")
>  .set("spark.executor.memoryOverhead", "400m")
>  .set("spark.dynamicAllocation.enabled", "true")
>  .set("spark.dynamicAllocation.minExecutors","1")
>  .set("spark.dynamicAllocation.maxExecutors","2")
>  .set("spark.ui.enabled","true") //enable spark UI
>  .set("spark.sql.shuffle.partitions",defaultPartitions)
>  .setMaster("local[2]")
>  sparkSession = SparkSession.builder.config(conf).getOrCreate()
>  
> val df = SparkFactory.sparkSession.sqlContext
>  .read
>  .option("header", "true")
>  .option("delimiter", delimiter)
>  .csv(textFileLocation)
>  
> joinedDf = upperCaseLeft.as("l")
>  .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer")
>  .select(compositeKeysCol ::: nonKeyCols.map(col => 
> mapHelper(col,toleranceValue,caseSensitive)): _*)
>  
> data = joinedDf.take(maxRecords)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-31281) Hit OOM Error - GC Limit

2020-03-29 Thread Alfred Davidson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070478#comment-17070478
 ] 

Alfred Davidson edited comment on SPARK-31281 at 3/29/20, 6:54 PM:
---

The allocated driver memory will be split for storage, memoryOverhead etc. You 
are executing a join (which is likely to be a broadcast join) and you have an 
action that is bringing the data to the driver - the driver doesn’t have enough 
memory (and initially trying to GC to free up space). You can either allocate 
more driver memory or change the fraction that it allocations for storage. I 
believe default value is 0.6 e.g reserves 60% of driver memory for storage


was (Author: alfiewdavidson):
The allocated driver memory will be split for storage, memoryOverhead etc. Your 
transformation is doing a join (which is likely to be a broadcast join) and you 
have an action that is bringing the data to the driver - the driver doesn’t 
have enough memory (and initially trying to GC to free up space). You can 
either allocate more driver memory or change the fraction that it allocations 
for storage. I believe default value is 0.6 e.g reserves 60% of driver memory 
for storage

> Hit OOM Error - GC Limit
> 
>
> Key: SPARK-31281
> URL: https://issues.apache.org/jira/browse/SPARK-31281
> Project: Spark
>  Issue Type: Question
>  Components: Java API
>Affects Versions: 2.4.4
>Reporter: HongJin
>Priority: Critical
>
> MemoryStore is 2.6GB
> conf = new SparkConf().setAppName("test")
>  //.set("spark.sql.codegen.wholeStage", "false")
>  .set("spark.driver.host", "localhost")
>  .set("spark.driver.memory", "4g")
>  .set("spark.executor.cores","1")
>  .set("spark.num.executors","1")
>  .set("spark.executor.memory", "4g")
>  .set("spark.executor.memoryOverhead", "400m")
>  .set("spark.dynamicAllocation.enabled", "true")
>  .set("spark.dynamicAllocation.minExecutors","1")
>  .set("spark.dynamicAllocation.maxExecutors","2")
>  .set("spark.ui.enabled","true") //enable spark UI
>  .set("spark.sql.shuffle.partitions",defaultPartitions)
>  .setMaster("local[2]")
>  sparkSession = SparkSession.builder.config(conf).getOrCreate()
>  
> val df = SparkFactory.sparkSession.sqlContext
>  .read
>  .option("header", "true")
>  .option("delimiter", delimiter)
>  .csv(textFileLocation)
>  
> joinedDf = upperCaseLeft.as("l")
>  .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer")
>  .select(compositeKeysCol ::: nonKeyCols.map(col => 
> mapHelper(col,toleranceValue,caseSensitive)): _*)
>  
> data = joinedDf.take(maxRecords)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org