GitHub user syedhashmi opened a pull request:
https://github.com/apache/spark/pull/876
SPARK[1784]: Adding a balancedPartitioner
This change adds a balanced partitioner to existing partitioners. The new
partitioner uses round robin strategy to allocate keys to partitions so that we
end up with balanced partitions for a RDD.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/syedhashmi/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/876.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #876
----
commit 4ca94cc155aea4be36505d5f37d037e209078196
Author: Syed Hashmi <[email protected]>
Date: 2014-05-09T23:32:32Z
[SPARK-1784] Add a new partitioner
This change adds a new partitioner which allows users
to specify # of keys per partition.
commit 66680150aa705bf301f79367647e671cb5ef9e21
Author: CodingCat <[email protected]>
Date: 2014-05-10T04:50:23Z
SPARK-1686: keep schedule() calling in the main thread
https://issues.apache.org/jira/browse/SPARK-1686
moved from original JIRA (by @markhamstra):
In deploy.master.Master, the completeRecovery method is the last thing to
be called when a standalone Master is recovering from failure. It is
responsible for resetting some state, relaunching drivers, and eventually
resuming its scheduling duties.
There are currently four places in Master.scala where completeRecovery is
called. Three of them are from within the actor's receive method, and aren't
problems. The last starts from within receive when the ElectedLeader message is
received, but the actual completeRecovery() call is made from the Akka
scheduler. That means that it will execute on a different scheduler thread, and
Master itself will end up running (i.e., schedule() ) from that Akka scheduler
thread.
In this PR, I added a new master message TriggerSchedule to trigger the
"local" call of schedule() in the scheduler thread
Author: CodingCat <[email protected]>
Closes #639 from CodingCat/SPARK-1686 and squashes the following commits:
81bb4ca [CodingCat] rename variable
69e0a2a [CodingCat] style fix
36a2ac0 [CodingCat] address Aaron's comments
ec9b7bb [CodingCat] address the comments
02b37ca [CodingCat] keep schedule() calling in the main thread
commit fd36542c5dd2eaf8657e0d6aff65ab2365beef56
Author: Syed Hashmi <[email protected]>
Date: 2014-05-26T00:55:17Z
[SPARK-1784] Add a balanced partitioner
This partitioner uses round robin allocation strategy for keys
to end up with balanced partitions for a RDD.
commit 4354836bda0f8f3c5286fa244ea6a655b4cda386
Author: Syed Hashmi <[email protected]>
Date: 2014-05-26T01:02:19Z
Revert "SPARK-1686: keep schedule() calling in the main thread"
This reverts commit 66680150aa705bf301f79367647e671cb5ef9e21.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---