GitHub user ankurdave opened a pull request:

    https://github.com/apache/spark/pull/1537

    Remove GraphX MessageToPartition for compatibility with sort-based shuffle

    MessageToPartition was used in `Graph#partitionBy`. Unlike a Tuple2, it 
marked the key as transient to avoid sending it over the network. However, it 
was incompatible with sort-based shuffle (SPARK-2045) and represented only a 
minor optimization: for partitionBy, it improved performance by 6.3% (30.4 s to 
28.5 s) and reduced communication by 5.6% (114.2 MB to 107.8 MB).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ankurdave/spark remove-MessageToPartition

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1537.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1537
    
----
commit ab713642dd6ef7ede943920c3c1904e17c8253fb
Author: Ankur Dave <[email protected]>
Date:   2014-07-23T01:21:26Z

    Remove unused VertexBroadcastMsg

commit f9d00546ea1d3e527212ae601232b5b7c5a2e84c
Author: Ankur Dave <[email protected]>
Date:   2014-07-23T01:24:54Z

    Remove MessageToPartition
    
    It was used in Graph#partitionBy. Unlike a Tuple2, it marked the key as
    transient to avoid sending it over the network. However, this is
    incompatible with sort-based shuffle (SPARK-2045) and is only a minor
    optimization.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to