I encountered the problem of "insufficient memory". The error is logged
in the file with a name " hs_err_pid86252.log"(attached in the end of this
I launched the spark job by " spark-submit --driver-memory 40g --master
yarn --deploy-mode client". The spark session was created with 10
executors each with 60g memory. The data access pattern is pretty simple, I
keep reading some spark dataframe from hdfs one by one, filter, join with
another dataframe, and then append the results to an dataframe:
for i= 1,2,3....
df1 = spark.read.parquet(file_i)
df_r = df1.filter(...). join(df2)
df_all = df_all.union(df_r)
each file_i is quite small, only a few GB, but there are a lot of such
files. after filtering and join, each df_r is also quite small. When the
program failed, df_all had only 10k rows which should be around 10GB. Each
machine in the cluster has round 80GB memory and 1TB disk space and only
one user was using the cluster when it failed due to insufficient memory.
My questions are:
i). The log file showed that it failed to allocate 8G committing memory.
But how could that happen when the driver and executors have more than 40g
free memory. In fact, only transformations but no actions had run when the
program failed. As I understand, only DAG and book-keeping work is done
during dataframe transformation, no data is brought into the memory. Why
spark still tries to allocate such large memory?
ii). Could manually running garbage collection help?
iii). Did I mis-specify some runtime parameter for jvm, yarn, or spark?
Any help or references are appreciated!
The content of hs_err_pid86252,log:
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 8663334912 bytes(~8G) for
committing reserved memory.
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
# Out of Memory Error (os_linux.cpp:2643), pid=86252,
# JRE version: OpenJDK Runtime Environment (8.0_151-b12) (build
# Java VM: OpenJDK 64-Bit Server VM (25.151-b12 mixed mode linux-amd64 )
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
--------------- T H R E A D ---------------
Current thread (0x00007fe0bc08c000): VMThread [stack: