GitHub user ankurdave opened a pull request:
https://github.com/apache/spark/pull/1537
Remove GraphX MessageToPartition for compatibility with sort-based shuffle
MessageToPartition was used in `Graph#partitionBy`. Unlike a Tuple2, it
marked the key as transient to avoid sending it over the network. However, it
was incompatible with sort-based shuffle (SPARK-2045) and represented only a
minor optimization: for partitionBy, it improved performance by 6.3% (30.4 s to
28.5 s) and reduced communication by 5.6% (114.2 MB to 107.8 MB).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ankurdave/spark remove-MessageToPartition
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1537.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1537
----
commit ab713642dd6ef7ede943920c3c1904e17c8253fb
Author: Ankur Dave <[email protected]>
Date: 2014-07-23T01:21:26Z
Remove unused VertexBroadcastMsg
commit f9d00546ea1d3e527212ae601232b5b7c5a2e84c
Author: Ankur Dave <[email protected]>
Date: 2014-07-23T01:24:54Z
Remove MessageToPartition
It was used in Graph#partitionBy. Unlike a Tuple2, it marked the key as
transient to avoid sending it over the network. However, this is
incompatible with sort-based shuffle (SPARK-2045) and is only a minor
optimization.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---