My job is 1TB join + 10 GB table on spark1.6.1
run on yarn mode:

*1. if I open shuffle service, the error is *
Job aborted due to stage failure: ShuffleMapStage 2 (writeToDirectory at has failed the maximum allowable number
of times: 4. Most recent failure reason:
org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException:
Executor is not registered (appId=application_1473819702737_1239, execId=52)

*2. if I close shuffle service, *
*set spark.executor.instances 80*
the error is :
ExecutorLostFailure (executor 71 exited caused by one of the running tasks)
Reason: Container marked as failed:
container_1473819702737_1432_01_406847560 on host: Exit status: 52. Diagnostics: Exception
from container-launch: ExitCodeException exitCode=52:
ExitCodeException exitCode=52:

These errors are reported on shuffle stage
My data is skew, some ids have 400million rows, but some ids only have
1million rows, is anybody has some ideas to solve the problem?

*3. My config is *
Here is my config
I use tungsten-sort in off-heap mode, in on-heap mode, the oom problem will
be more serious

spark.driver.cores 4

spark.driver.memory 8g

# use on client mode 8g 4

spark.executor.memory 8g

spark.executor.cores 4

spark.yarn.executor.memoryOverhead 6144

spark.memory.offHeap.enabled true

spark.memory.offHeap.size 4000000000

Best & Regards
Cyanny LIANG

Reply via email to