At 2014-12-05 02:26:52 -0800, Yifan LI wrote:
> I have a graph in where each vertex keep several messages to some faraway
> neighbours(I mean, not to only immediate neighbours, at most k-hops far, e.g.
> k = 5).
>
> now, I propose to distribute these messages to their corresponding
> destinations(say, "faraway neighbours”):
>
> - by using pregel api, one superset is enough
> - by using spark basic operations(groupByKey, leftJoin, etc) on vertices RDD
> and its intermediate results.
>
> w.r.t the communication among machines, and the high cost of
> groupByKey/leftJoin, I guess that 1st option is better?
If messages will only travel along edges (even if they travel over multiple
edges), then the Pregel API should be faster. You'll have to run k supersteps
for messages to propagate k hops away from their origins.
If messages can jump directly between two arbitrary vertices, then doing a
single set of Spark basic operations may be faster than running multiple Pregel
supersteps.
Ankur
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org