GitHub user ankurdave opened a pull request:
https://github.com/apache/spark/pull/3100
[SPARK-3936] Add aggregateMessages, which supersedes mapReduceTriplets
aggregateMessages enables neighborhood computation similarly to
mapReduceTriplets, but it introduces two API improvements:
1. Messages are sent using an imperative interface based on EdgeContext
rather than by returning an iterator of messages. This is more efficient,
providing a 20.2% speedup on PageRank over apache/spark#3054 (uk-2007-05 graph,
10 iterations, 16 r3.2xlarge machines, sped up from 403 s to 322 s).
2. Rather than attempting bytecode inspection, the required triplet fields
must be explicitly specified by the user by passing a TripletFields object.
This fixes SPARK-3936.
Subsumes apache/spark#2815.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ankurdave/spark aggregateMessages
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3100.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3100
----
commit 4a566dc86624ac3f6dfa747d344c86e4be44adc2
Author: Ankur Dave <[email protected]>
Date: 2014-08-14T02:33:47Z
Optimizations for mapReduceTriplets and EdgePartition
1. EdgePartition now stores local vertex ids instead of global ids. This
avoids hash lookups when looking up vertex attributes and aggregating
messages.
2. Internal iterators in mapReduceTriplets are inlined into a while
loop.
commit b567be2825ea22f2e61fbd9caa34940f5bc404df
Author: Ankur Dave <[email protected]>
Date: 2014-11-04T09:56:48Z
iter.foreach -> while loop
commit c85076de62b4c3344c443d4e85fce8fc47274aac
Author: Ankur Dave <[email protected]>
Date: 2014-11-04T09:58:00Z
Readability improvements
commit e0f8ecc7b678de2b011650ed96b974369730947e
Author: Ankur Dave <[email protected]>
Date: 2014-11-04T09:58:23Z
Take activeSet in ExistingEdgePartitionBuilder
Also rename VertexPreservingEdgePartitionBuilder to
ExistingEdgePartitionBuilder to better reflect its usage.
commit 194a2df94768be9c08ed50654170bad937bd115a
Author: Ankur Dave <[email protected]>
Date: 2014-11-04T10:03:34Z
Test triplet iterator in EdgePartition serialization test
commit 1e80aca308463b0ec7dbeee58c7d1935ebb59e77
Author: Ankur Dave <[email protected]>
Date: 2014-11-01T07:01:21Z
Add aggregateMessages, which supersedes mapReduceTriplets
aggregateMessages enables neighborhood computation similarly to
mapReduceTriplets, but it introduces two API improvements:
1. Messages are sent using an imperative interface based on EdgeContext
rather than by returning an iterator of messages. This is more
efficient, providing a 20.2% speedup on PageRank over
apache/spark#3054 (uk-2007-05 graph, 10 iterations, 16 r3.2xlarge
machines, sped up from 403 s to 322 s).
2. Rather than attempting bytecode inspection, the required triplet
fields must be explicitly specified by the user by passing a
TripletFields object. This fixes SPARK-3936.
Subsumes apache/spark#2815.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]