[GitHub] spark pull request: SPARK[1784]: Adding a balancedPartitioner

syedhashmi Sun, 25 May 2014 18:05:28 -0700

GitHub user syedhashmi opened a pull request:

    https://github.com/apache/spark/pull/876


    SPARK[1784]: Adding a balancedPartitioner

    This change adds a balanced partitioner to existing partitioners. The new 
partitioner uses round robin strategy to allocate keys to partitions so that we 
end up with balanced partitions for a RDD.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/syedhashmi/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/876.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #876
    
----
commit 4ca94cc155aea4be36505d5f37d037e209078196
Author: Syed Hashmi <[email protected]>
Date:   2014-05-09T23:32:32Z

    [SPARK-1784] Add a new partitioner
    
    This change adds a new partitioner which allows users
    to specify # of keys per partition.

commit 66680150aa705bf301f79367647e671cb5ef9e21
Author: CodingCat <[email protected]>
Date:   2014-05-10T04:50:23Z

    SPARK-1686: keep schedule() calling in the main thread
    
    https://issues.apache.org/jira/browse/SPARK-1686
    
    moved from original JIRA (by @markhamstra):
    
    In deploy.master.Master, the completeRecovery method is the last thing to 
be called when a standalone Master is recovering from failure. It is 
responsible for resetting some state, relaunching drivers, and eventually 
resuming its scheduling duties.
    
    There are currently four places in Master.scala where completeRecovery is 
called. Three of them are from within the actor's receive method, and aren't 
problems. The last starts from within receive when the ElectedLeader message is 
received, but the actual completeRecovery() call is made from the Akka 
scheduler. That means that it will execute on a different scheduler thread, and 
Master itself will end up running (i.e., schedule() ) from that Akka scheduler 
thread.
    
    In this PR, I added a new master message TriggerSchedule to trigger the 
"local" call of schedule() in the scheduler thread
    
    Author: CodingCat <[email protected]>
    
    Closes #639 from CodingCat/SPARK-1686 and squashes the following commits:
    
    81bb4ca [CodingCat] rename variable
    69e0a2a [CodingCat] style fix
    36a2ac0 [CodingCat] address Aaron's comments
    ec9b7bb [CodingCat] address the comments
    02b37ca [CodingCat] keep schedule() calling in the main thread

commit fd36542c5dd2eaf8657e0d6aff65ab2365beef56
Author: Syed Hashmi <[email protected]>
Date:   2014-05-26T00:55:17Z

    [SPARK-1784] Add a balanced partitioner
    
    This partitioner uses round robin allocation strategy for keys
    to end up with balanced partitions for a RDD.

commit 4354836bda0f8f3c5286fa244ea6a655b4cda386
Author: Syed Hashmi <[email protected]>
Date:   2014-05-26T01:02:19Z

    Revert "SPARK-1686: keep schedule() calling in the main thread"
    
    This reverts commit 66680150aa705bf301f79367647e671cb5ef9e21.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK[1784]: Adding a balancedPartitioner

Reply via email to