GitHub user jerryshao opened a pull request:

    https://github.com/apache/spark/pull/5064

    [SPARK-5523][Core][Streaming] Add a cache for hostname in TaskMetrics to 
decrease the memory usage and GC overhead

    Hostname in TaskMetrics will be created through deserialization, mostly the 
number of hostname is only the order of number of cluster node, so adding a 
cache layer to dedup the object could reduce the memory usage and alleviate GC 
overhead, especially for long-running and fast job generation applications like 
Spark Streaming.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jerryshao/apache-spark SPARK-5523

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5064.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5064
    
----
commit 7bc3834970a0edafa87813aa83af954118c19f4e
Author: Saisai Shao <[email protected]>
Date:   2015-03-17T06:41:26Z

    Add a pool to cache the hostname

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to