GitHub user qqsun8819 opened a pull request:
https://github.com/apache/spark/pull/214
[SPARK-1141] [WIP] Parallelize Task Serialization
https://spark-project.atlassian.net/browse/SPARK-1141
@kayousterhout
copied from JIRA(design doc in JIRA is old, I'll update it later)
TaskSetManager.resourceOffer will return a TaskDescWithoutSerializeTask
object , this object will be a half-copy of TaskDescrption exception
_serializedTask ByteBffer, instead, it will contain a Task object and seriailze
part inside TaskSetManager.resourceOffer will be moved to TaskSchedulerImpl's
"Runnable" working thread which will be placed inside threadpool.
DriverSuite failed in my own env. Working on fixing
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/qqsun8819/spark task-serialize
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/214.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #214
----
commit 53795965dd16c54a4981ef4ee754f326663f9795
Author: Ouyang Jin <[email protected]>
Date: 2014-03-16T15:57:43Z
Initial version of Parallelize Task Serialization in dev code, but this
version has a chance to hang in multi-task execution and needs debug
commit 0bb37447d403c63b21b06cf15a612eb363c701da
Author: OuYang Jin <[email protected]>
Date: 2014-03-23T14:47:56Z
Merge remote-tracking branch 'upstream/master' into task-serialize
commit 177195d20ddef34d339f6385d50382944c9c149d
Author: OuYang Jin <[email protected]>
Date: 2014-03-24T06:16:27Z
Modify asychroniazed sleep wait to pass job running case
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---