on yarn cluster model , I've tried to increase executor memorey from 4GB to 8GB , didn't help . Driver memory keep growing and killed by yarn finally:
Attaching to process ID 10118, please wait... Debugger attached successfully. Server compiler detected. JVM version is 25.161-b12 using thread-local object allocation. Parallel GC with 8 thread(s) Heap Configuration: MinHeapFreeRatio = 0 MaxHeapFreeRatio = 100 MaxHeapSize = 4294967296 (4096.0MB) NewSize = 172490752 (164.5MB) MaxNewSize = 1431306240 (1365.0MB) OldSize = 345505792 (329.5MB) NewRatio = 2 SurvivorRatio = 8 MetaspaceSize = 21807104 (20.796875MB) CompressedClassSpaceSize = 1073741824 (1024.0MB) MaxMetaspaceSize = 17592186044415 MB G1HeapRegionSize = 0 (0.0MB) Heap Usage: PS Young Generation Eden Space: capacity = 1092616192 (1042.0MB) used = 18259168 (17.413299560546875MB) free = 1074357024 (1024.5867004394531MB) 1.67114199237494% used From Space: capacity = 169345024 (161.5MB) used = 0 (0.0MB) free = 169345024 (161.5MB) 0.0% used To Space: capacity = 167772160 (160.0MB) used = 0 (0.0MB) free = 167772160 (160.0MB) 0.0% used PS Old Generation capacity = 2863661056 (2731.0MB) used = 2680250800 (2556.0863494873047MB) free = 183410256 (174.9136505126953MB) 93.59525263593207% used 30723 interned Strings occupying 3189352 bytes. Application application_1541483082023_0636 failed 1 times due to AM Container for appattempt_1541483082023_0636_000001 exited with exitCode: -104 For more detailed output, check application tracking page:http://bdp-scm-03:8088/proxy/application_1541483082023_0636/Then, click on links to logs of each attempt. Diagnostics: Container [pid=10104,containerID=container_e39_1541483082023_0636_01_000001] is running beyond physical memory limits. Current usage: 4.6 GB of 4.5 GB physical memory used; 6.5 GB of 9.4 GB virtual memory used. Killing container. Dump of the process-tree for container_e39_1541483082023_0636_01_000001 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 10118 10104 10104 10104 (java) 24792 3125 6873763840 1210012 /usr/java/jdk1.8.0_161/bin/java -server -Xmx4096m -Djava.io.tmpdir=/data/var/yarn/nm/usercache/devuser/appcache/application_1541483082023_0636/container_e39_1541483082023_0636_01_000001/tmp -Dlog4j.configuration=file:/data/spark-conf-4-livy/logs/log4j-driver.properties -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/tmp/drivergc.log -Dspark.yarn.app.container.log.dir=/var/yarn/container-logs/application_1541483082023_0636/container_e39_1541483082023_0636_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class org.apache.livy.rsc.driver.RSCDriverBootstrapper --properties-file /data/var/yarn/nm/usercache/devuser/appcache/application_1541483082023_0636/container_e39_1541483082023_0636_01_000001/__spark_conf__/__spark_conf__.properties |- 10104 10102 10104 10104 (bash) 0 0 116027392 375 /bin/bash -c LD_LIBRARY_PATH=/data/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/hadoop/../../../CDH-5.14.0-1.cdh5.14.0.p0.24/lib/hadoop/lib/native::/data/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/hadoop/lib/native /usr/java/jdk1.8.0_161/bin/java -server -Xmx4096m -Djava.io.tmpdir=/data/var/yarn/nm/usercache/devuser/appcache/application_1541483082023_0636/container_e39_1541483082023_0636_01_000001/tmp '-Dlog4j.configuration=file:/data/spark-conf-4-livy/logs/log4j-driver.properties' '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCDateStamps' '-Xloggc:/tmp/drivergc.log' -Dspark.yarn.app.container.log.dir=/var/yarn/container-logs/application_1541483082023_0636/container_e39_1541483082023_0636_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.livy.rsc.driver.RSCDriverBootstrapper' --properties-file /data/var/yarn/nm/usercache/devuser/appcache/application_1541483082023_0636/container_e39_1541483082023_0636_01_000001/__spark_conf__/__spark_conf__.properties 1> /var/yarn/container-logs/application_1541483082023_0636/container_e39_1541483082023_0636_01_000001/stdout 2> /var/yarn/container-logs/application_1541483082023_0636/container_e39_1541483082023_0636_01_000001/stderr Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 Failing this attempt. Failing the application. 2018-11-12 lk_hadoop 发件人:"Rabe, Jens" <jens.r...@iwes.fraunhofer.de> 发送时间:2018-11-12 14:55 主题:RE: about LIVY-424 收件人:"user@livy.incubator.apache.org"<user@livy.incubator.apache.org> 抄送: Do you run Spark in local mode or on a cluster? If on a cluster, try increasing executor memory. From: lk_hadoop <lk_had...@163.com> Sent: Monday, November 12, 2018 7:53 AM To: user <user@livy.incubator.apache.org>; lk_hadoop <lk_had...@163.com> Subject: Re: about LIVY-424 I'm using livy-0.5.0 with spark2.3.0,I started a session with 4GB mem for Driver, And I run code server times : var tmp1 = spark.sql("use tpcds_bin_partitioned_orc_2");var tmp2 = spark.sql("select count(1) from tpcds_bin_partitioned_orc_2.store_sales").show the table have 5760749 rows data. after run about 10 times , the Driver physical memory will beyond 4.5GB and killed by yarn. I saw the old generation memory keep growing and can not release by gc. 2018-11-12 lk_hadoop 发件人:"lk_hadoop"<lk_had...@163.com> 发送时间:2018-11-12 09:37 主题:about LIVY-424 收件人:"user"<user@livy.incubator.apache.org> 抄送: hi,all: I meet this issue https://issues.apache.org/jira/browse/LIVY-424 , anybody know how to resolve it? 2018-11-12 lk_hadoop