Here are some out-of-the-box ideas: If the elements lie in a fairly small range and/or you're willing to work with limited precision, you could use counting sort. Moreover, you could iteratively find the median using bisection, which would be associative and commutative. It's easy to think of improvements that would make this approach give a reasonable answer in a few iterations. I have no idea about mixing algorithmic iterations with median-finding iterations.
On Wed, Apr 23, 2014 at 8:20 PM, Ryan Compton <compton.r...@gmail.com>wrote: > I'm trying shoehorn a label propagation-ish algorithm into GraphX. I > need to update each vertex with the median value of their neighbors. > Unlike PageRank, which updates each vertex with the mean of their > neighbors, I don't have a simple commutative and associative function > to use for mergeMsg. > > What are my options? It looks like I can choose between: > > 1. a hacky mergeMsg (i.e. combine a,b -> Array(a,b) and then do the > median in vprog) > 2. collectNeighbors and then median > 3. ignore GraphX and just do the whole thing with joins (which I > actually got working, but its slow) > > Is there another possibility that I'm missing? >