At 2014-09-17 11:39:19 -0700, spr <s...@yarcdata.com> wrote:
> I'm trying to implement label propagation in GraphX.  The core step of that
> algorithm is
>
> - for each vertex, find the most frequent label among its neighbors and set
> its label to that.
>
> [...]
>
> It seems on the "broken" line above, I don't want to reduce all the values
> to a scalar, as this code does, but rather group them first and then reduce
> them.  Can I do that all within mapReduceTriples?  If not, how do I build
> something that I can then further reduce?

Label propagation is actually already implemented in GraphX [1]. The way it 
handles the "most frequent label" reduce operation is to aggregate a histogram, 
implemented as a map from label to frequency, and then take the most frequent 
element from the map at the end. Something to watch out for is that this can 
create large aggregation messages for high-degree vertices.

Ankur

[1] 
https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/LabelPropagation.scala

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to