GitHub user coolfrood opened a pull request:
https://github.com/apache/spark/pull/6323
Speed up task scheduling in standalone mode by reusing serializer
My experiments with scheduling very short tasks in standalone cluster mode
indicated that a significant amount of time was being spent in scheduling the
tasks (>500ms for 256 tasks). I found that most of the time was being spent in
creating a new instance of serializer for each task. Changing this to just one
serializer brought down the scheduling time to 8ms.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/coolfrood/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/6323.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6323
----
commit fe530cddae1cb573387f7fc1eb344513c91d69bb
Author: Akshat Aranya <[email protected]>
Date: 2015-05-21T17:23:54Z
Speed up task scheduling in standalone mode by reusing serializer
instead of creating a new one for each task.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]