GitHub user XuTingjun opened a pull request:
https://github.com/apache/spark/pull/9246
[SPARK-5210] Support group event log when app is long-running
For long-running Spark applications (e.g. running for days / weeks), the
Spark event log may grow to be very large.
I think group event log by job is an acceptable resolution.
1. To group eventLog, one application has two kinds file: one meta file and
many part files. We put ```StageSubmitted/ StageCompleted/ TaskResubmit/
TaskStart/TaskEnd/TaskGettingResult/ JobStart/JobEnd``` events into meta file,
and put other events into part file. The event log shows like below:
```
application_1439246697595_0001-meta
application_1439246697595_0001-part1
application_1439246697595_0001-part2
```
2.To HistoryServer, every part file will be treated as an application, and
it will replay meta file after replay part file. Below is the display of group
app on HistoryServer web:

You can merge this pull request into a Git repository by running:
$ git pull https://github.com/XuTingjun/spark SPARK-5210
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9246.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9246
----
commit 62c982b0048252d88de27e0791cbafbbc69c6c57
Author: xutingjun <[email protected]>
Date: 2015-10-23T07:52:10Z
add big event log
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]