GitHub user jisookim0513 opened a pull request:

    https://github.com/apache/spark/pull/16714

    [SPARK-16333][Core] Enable EventLoggingListener to log less

    ## What changes were proposed in this pull request?
    
    Starting from Spark 2.0, task metrics are in the form of an accumulator. 
This is good but also causes excessive event logs because the metrics are 
logged twice (one under "Accumulators" and one under "Task Metrics"). For 
applications with lots of tasks, the size of event logs could be tens of GB and 
it is not feasible for Spark History Server to parse the logs and reconstruct 
the job UI. 
    
    This PR adds an option for EventLoggingListener not to log internal 
accumulators that are for task metrics. It also adds an option not to log 
"Update Block Statuses" metric that is quite verbose and might not be needed on 
some occasions. 
    
    After updating to Spark 2.0, a size of the event log of some application 
jumped from ~ 1GB to over 40 GB. With this patch, event log size went back to 
similar to the previous sizes with Spark 1.5.2.
    
    ## How was this patch tested?
    
    Unit tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/metamx/spark enable-less-eventlogs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16714.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16714
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to