GitHub user mallman opened a pull request:
https://github.com/apache/spark/pull/14798
[SPARK-17231][CORE] Avoid building debug or trace log messages unless the
respective log level is enabled
(This PR addresses https://issues.apache.org/jira/browse/SPARK-17231)
## What changes were proposed in this pull request?
While debugging the performance of a large GraphX connected components
computation, we found several places in the `network-common` and
`network-shuffle` code bases where trace or debug log messages are constructed
even if the respective log level is disabled. According to YourKit, these
constructions were creating substantial churn in the eden region. Refactoring
the respective code to avoid these unnecessary constructions except where
necessary led to a modest but measurable reduction in our job's task time, GC
time and the ratio thereof.
## How was this patch tested?
We computed the connected components of a graph with about 2.6 billion
vertices and 1.7 billion edges four times. We used four different EC2 clusters
each with 8 r3.8xl worker nodes. Two test runs used Spark master. Two used
Spark master + this PR. The results from the first test run, master and
master+PR:


The results from the second test run, master and master+PR:


Though modest, I believe these results are significant.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/VideoAmp/spark-public
spark-17231-logging_perf_improvements
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14798.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14798
----
commit 82ba6f9002aaa169830335441c4b4ddbcc43a868
Author: Michael Allman <[email protected]>
Date: 2016-08-19T01:17:20Z
[SPARK-17231][CORE] Avoid building debug or trace log messages unless
the respective log level is enabled
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]