git commit: [YARN] SPARK-2668: Add variable of yarn log directory for reference from the log4j configuration

tgraves Tue, 23 Sep 2014 07:43:15 -0700

Repository: spark
Updated Branches:
  refs/heads/master f9d6220c7 -> 14f8c3404



[YARN] SPARK-2668: Add variable of yarn log directory for reference from the 
log4j configuration

Assign value of yarn container log directory to java opts 
"spark.yarn.app.container.log.dir", So user defined log4j.properties can 
reference this value and write log to YARN container's log directory.
Otherwise, user defined file appender will only write to container's CWD, and 
log files in CWD will not be displayed on YARN UIï¼and either cannot be 
aggregated to HDFS log directory after job finished.

User defined log4j.properties reference example:
log4j.appender.rolling_file.File = ${spark.yarn.app.container.log.dir}/spark.log

Author: peng.zhang <peng.zh...@xiaomi.com>

Closes #1573 from renozhang/yarn-log-dir and squashes the following commits:

16c5cb8 [peng.zhang] Update doc
f2b5e2a [peng.zhang] Change variable's name, and update running-on-yarn.md
503ea2d [peng.zhang] Support log4j log to yarn container dir


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/14f8c340
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/14f8c340
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/14f8c340

Branch: refs/heads/master
Commit: 14f8c340402366cb998c563b3f7d9ff7d9940271
Parents: f9d6220
Author: peng.zhang <peng.zh...@xiaomi.com>
Authored: Tue Sep 23 08:45:56 2014 -0500
Committer: Thomas Graves <tgra...@apache.org>
Committed: Tue Sep 23 08:45:56 2014 -0500

----------------------------------------------------------------------
 docs/running-on-yarn.md                                           | 2 ++
 .../src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala  | 3 +++
 .../scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala | 3 +++
 3 files changed, 8 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/14f8c340/docs/running-on-yarn.md
----------------------------------------------------------------------
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index 74bcc2e..4b3a49e 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -205,6 +205,8 @@ Note that for the first option, both executors and the 
application master will s
 log4j configuration, which may cause issues when they run on the same node 
(e.g. trying to write
 to the same log file).
 
+If you need a reference to the proper location to put log files in the YARN so 
that YARN can properly display and aggregate them, use 
"${spark.yarn.app.container.log.dir}" in your log4j.properties. For example, 
log4j.appender.file_appender.File=${spark.yarn.app.container.log.dir}/spark.log.
 For streaming application, configuring RollingFileAppender and setting file 
location to YARN's log directory will avoid disk overflow caused by large log 
file, and logs can be accessed using YARN's log utility.
+
 # Important notes
 
 - Before Hadoop 2.2, YARN does not support cores in container resource 
requests. Thus, when running against an earlier version, the numbers of cores 
given via command line arguments cannot be passed to YARN.  Whether core 
requests are honored in scheduling decisions depends on which scheduler is in 
use and how it is configured.

http://git-wip-us.apache.org/repos/asf/spark/blob/14f8c340/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala
----------------------------------------------------------------------
diff --git 
a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala 
b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala
index c96f731..6ae4d49 100644
--- a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala
+++ b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala
@@ -388,6 +388,9 @@ trait ClientBase extends Logging {
         .foreach(p => javaOpts += s"-Djava.library.path=$p")
     }
 
+    // For log4j configuration to reference
+    javaOpts += "-D=spark.yarn.app.container.log.dir=" + 
ApplicationConstants.LOG_DIR_EXPANSION_VAR
+
     val userClass =
       if (args.userClass != null) {
         Seq("--class", YarnSparkHadoopUtil.escapeForShell(args.userClass))

http://git-wip-us.apache.org/repos/asf/spark/blob/14f8c340/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala
----------------------------------------------------------------------
diff --git 
a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala
 
b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala
index 312d82a..f56f72c 100644
--- 
a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala
+++ 
b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala
@@ -98,6 +98,9 @@ trait ExecutorRunnableUtil extends Logging {
         }
     */
 
+    // For log4j configuration to reference
+    javaOpts += "-D=spark.yarn.app.container.log.dir=" + 
ApplicationConstants.LOG_DIR_EXPANSION_VAR
+
     val commands = Seq(Environment.JAVA_HOME.$() + "/bin/java",
       "-server",
       // Kill if OOM is raised - leverage yarn's failure handling to cause 
rescheduling.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [YARN] SPARK-2668: Add variable of yarn log directory for reference from the log4j configuration

Reply via email to