Losing executors due to memory problems

Muttineni, Vinay Thu, 11 Aug 2016 16:43:13 -0700

Hello,
I have a spark job that basically reads data from two tables into two 
Dataframes which are subsequently converted to RDD's. I, then, join them based 
on a common key.
Each table is about 10 TB in size but after filtering, the two RDD's are about 
500GB each.
I have 800 executors with 8GB memory per executor.
Everything works fine until the join stage. But, the join stage is throwing the 
below error.
I tried increasing the partitions before the join stage but it doesn't change 
anything.
Any ideas, how I can fix this and what I might be doing wrong?
Thanks,
Vinay


ExecutorLostFailure (executor 208 exited caused by one of the running tasks) 
Reason: Container marked as failed: container_1469773002212_96618_01_000246 on 
host:. Exit status: 143. Diagnostics: Container 
[pid=31872,containerID=container_1469773002212_96618_01_000246] is running 
beyond physical memory limits. Current usage: 15.2 GB of 15.1 GB physical 
memory used; 15.9 GB of 31.8 GB virtual memory used. Killing container.
Dump of the process-tree for container_1469773002212_96618_01_000246 :
         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
         |- 31883 31872 31872 31872 (java) 519517 41888 17040175104 3987193 
/usr/java/latest/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms14336m 
-Xmx14336m 
-Djava.io.tmpdir=/hadoop/11/scratch/local/usercacheappcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/tmp
 -Dspark.driver.port=32988 -Dspark.ui.port=0 -Dspark.akka.frameSize=256 
-Dspark.yarn.app.container.log.dir=/hadoop/12/scratch/logs/application_1469773002212_96618/container_1469773002212_96618_01_000246
 -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend 
--driver-url spark://CoarseGrainedScheduler@10.12.7.4:32988 --executor-id 208 
-hostname x.com --cores 11 --app-id application_1469773002212_96618 
--user-class-path file:/hadoop/11/scratch/local/usercache 
/appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/__app__.jar
 --user-class-path file:/hadoop/11/scratch/local/usercache/ 
appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/mysql-connector-java-5.0.8-bin.jar
 --user-class-path 
file:/hadoop/11/scratch/local/usercache/appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/datanucleus-core-3.2.10.jar
 --user-class-path 
file:/hadoop/11/scratch/local/usercache/appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/datanucleus-api-jdo-3.2.6.jar
 --user-class-path 
file:/hadoop/11/scratch/local/usercache/appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/datanucleus-rdbms-3.2.9.jar
         |- 31872 16580 31872 31872 (bash) 0 0 9146368 267 /bin/bash -c 
LD_LIBRARY_PATH=/apache/hadoop/lib/native:/apache/hadoop/lib/native/Linux-amd64-64:
 /usr/java/latest/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms14336m 
-Xmx14336m -Djava.io.tmpdir=/hadoop/11/scratch/local/usercache/ 
appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/tmp
 '-Dspark.driver.port=32988' '-Dspark.ui.port=0' '-Dspark.akka.frameSize=256' 
-Dspark.yarn.app.container.log.dir=/hadoop/12/scratch/logs/application_1469773002212_96618/container_1469773002212_96618_01_000246
 -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend 
--driver-url spark://CoarseGrainedScheduler@1.4.1.6:32988 --executor-id 208 
--hostname x.com --cores 11 --app-id application_1469773002212_96618 
--user-class-path file:/hadoop/11/scratch/local/usercache/ 
appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/__app__.jar
 --user-class-path 
file:/hadoop/11/scratch/local/usercache/appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/mysql-connector-java-5.0.8-bin.jar
 --user-class-path 
file:/hadoop/11/scratch/local/usercache/appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/datanucleus-core-3.2.10.jar
 --user-class-path 
file:/hadoop/11/scratch/local/usercache/appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/datanucleus-api-jdo-3.2.6.jar
 --user-class-path 
file:/hadoop/11/scratch/local/usercache/appcache/application_1469773002212_96618/container_1469773002212_96618_01_000246/datanucleus-rdbms-3.2.9.jar
 1> 
/hadoop/12/scratch/logs/application_1469773002212_96618/container_1469773002212_96618_01_000246/stdout
 2> 
/hadoop/12/scratch/logs/application_1469773002212_96618/container_1469773002212_96618_01_000246/stderr

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

Losing executors due to memory problems

Reply via email to