Whoops, I should have mentioned that it's a multivariate median (cf http://www.pnas.org/content/97/4/1423.full.pdf ). It's easy to compute when all the values are accessible at once. I'm not sure it's possible with a combiner. So, I guess the question should be: "Can I use GraphX's Pregel without a combiner?"
On Wed, Apr 23, 2014 at 7:01 PM, Tom Vacek <[email protected]> wrote: > Here are some out-of-the-box ideas: If the elements lie in a fairly small > range and/or you're willing to work with limited precision, you could use > counting sort. Moreover, you could iteratively find the median using > bisection, which would be associative and commutative. It's easy to think > of improvements that would make this approach give a reasonable answer in a > few iterations. I have no idea about mixing algorithmic iterations with > median-finding iterations. > > > On Wed, Apr 23, 2014 at 8:20 PM, Ryan Compton <[email protected]> > wrote: >> >> I'm trying shoehorn a label propagation-ish algorithm into GraphX. I >> need to update each vertex with the median value of their neighbors. >> Unlike PageRank, which updates each vertex with the mean of their >> neighbors, I don't have a simple commutative and associative function >> to use for mergeMsg. >> >> What are my options? It looks like I can choose between: >> >> 1. a hacky mergeMsg (i.e. combine a,b -> Array(a,b) and then do the >> median in vprog) >> 2. collectNeighbors and then median >> 3. ignore GraphX and just do the whole thing with joins (which I >> actually got working, but its slow) >> >> Is there another possibility that I'm missing? > >
