[GitHub] spark pull request: Speed up task scheduling in standalone mode by...

coolfrood Thu, 21 May 2015 10:30:12 -0700

GitHub user coolfrood opened a pull request:

    https://github.com/apache/spark/pull/6323


    Speed up task scheduling in standalone mode by reusing serializer

    My experiments with scheduling very short tasks in standalone cluster mode 
indicated that a significant amount of time was being spent in scheduling the 
tasks (>500ms for 256 tasks).  I found that most of the time was being spent in 
creating a new instance of serializer for each task.  Changing this to just one 
serializer brought down the scheduling time to 8ms.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/coolfrood/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6323.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6323
    
----
commit fe530cddae1cb573387f7fc1eb344513c91d69bb
Author: Akshat Aranya <[email protected]>
Date:   2015-05-21T17:23:54Z

    Speed up task scheduling in standalone mode by reusing serializer
    instead of creating a new one for each task.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Speed up task scheduling in standalone mode by...

Reply via email to