Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/13712#discussion_r68681679
--- Diff: docs/running-on-yarn.md ---
@@ -472,6 +472,29 @@ To use a custom metrics.properties for the application
master and executors, upd
Currently supported services are: <code>hive</code>, <code>hbase</code>
</td>
</tr>
+<tr>
+ <td><code>spark.yarn.rolledLog.includePattern</code></td>
+ <td>(none)</td>
+ <td>
+ Java Regex to filter the log files which match the defined include
pattern
+ and those log files will be aggregated in a rolling fashion.
+ This will be used with YARN's rolling log aggregation, to enable this
feature in YARN side
+
<code>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</code>
should be
+ configured in yarn-site.xml.
+ Besides this feature can only be used with Hadoop 2.6.1+. And the log4j
appender should be changed to
+ File appender. Based on the file name configured in log4j configuration
(like spark.log),
--- End diff --
Hi @tgravescs the problem here is that by default Spark's log4j is using
`ConsoleAppender` and redirecting to `stdout` and `stderr` files in the yarn
application start command, the behavior of yarn rolling log aggregation is to
collect the logs and then delete them, in this case once `stdout` and `stderr`
files are collected and deleted, the new `stdout` and `stderr` files will not
be generated again, so logs will be missing. But for FileAppender, new files
will be created once deleted, so that's why only FileAppender can be worked.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]