GitHub user jerryshao opened a pull request:
https://github.com/apache/spark/pull/5064
[SPARK-5523][Core][Streaming] Add a cache for hostname in TaskMetrics to
decrease the memory usage and GC overhead
Hostname in TaskMetrics will be created through deserialization, mostly the
number of hostname is only the order of number of cluster node, so adding a
cache layer to dedup the object could reduce the memory usage and alleviate GC
overhead, especially for long-running and fast job generation applications like
Spark Streaming.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jerryshao/apache-spark SPARK-5523
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5064.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5064
----
commit 7bc3834970a0edafa87813aa83af954118c19f4e
Author: Saisai Shao <[email protected]>
Date: 2015-03-17T06:41:26Z
Add a pool to cache the hostname
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]