On 23 October 2011 07:16, Bae, Jae Hyeon <[email protected]> wrote: > I am implementing graph clustering algorithm based on hadoop and mahout. > This is my term project of data mining course.
Cool! Fun topic... > Spectral method of graph clustering needs calculation of eigenvectors, > which is not practically efficient with the large scale graph. Thus, there > exists multi-level graph clustering method without eigenvectors. This > contains graph coarsening, base clustering, refining. Refining stage can be > done with weighted kernel k-means clustering which is not so difficult to > be implemented in MapReduce way, but the problem is graph coarsening. > Pseudocode is on this paper > http://www.cs.utexas.edu/~ddn/papers/sui10.pdf Like any graph > processing algorithm, this algorithm does not look easy to > be intuitively implemented in MapReduce way. So, I need a help from experts > more proficient at converting single thread graph algorithm to MapReduce > way. If this work is done smoothly, I will contribute this graph clustering > algorithm to Mahout if I am allowed to do so. Do take a look at the Spectral Clustering code (and Eigencuts) already in Mahout. orig proposal: https://issues.apache.org/jira/browse/MAHOUT-363 http://code.google.com/p/eigencuts/ wiki: https://cwiki.apache.org/MAHOUT/spectral-clustering.html blog notes: http://spectrallyclustered.wordpress.com/ bugs: https://issues.apache.org/jira/browse/MAHOUT-524 javadoc: (maybe not freshest) http://javasourcecode.org/html/open-source/mahout/mahout-0.5/org/apache/mahout/clustering/spectral/common/package-summary.html The Wiki page in particular might be a good start, https://cwiki.apache.org/MAHOUT/spectral-clustering.html Code is in public svn: http://svn.apache.org/repos/asf/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/ Take a look at the run() method in http://svn.apache.org/repos/asf/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/kmeans/SpectralKMeansDriver.java ...this takes in a textual representation of an affinity matrix, constructs the laplacian representation of it and uses the existing DistributedLanczosSolver/SVD (see https://cwiki.apache.org/MAHOUT/dimensional-reduction.html ) and K-Means components from elsewhere in Mahout. The code seems fairly well commented. Having said that, this code doesn't seem very happy currently. It seems per https://issues.apache.org/jira/browse/MAHOUT-524 difficult to get running in current Mahout trunk. I couldn't get it running yet. If you can find a way in your term project to build on this work, that would be fantastic... Hope this helps, cheers, Dan
