Hi I am a graduate student at University of Texas at Austin CS department and a real fan of Mahout. I've been using mahout library for more than a year.
This semester, I implemented scalable graph clustering without eigenvector computation with multi-level approach as my data mining term project. Its main idea is that weighted kernel k-means clustering is mathematically equivalent to normalized cut graph clustering. So if we can find the optimal initial clustering, we can find graph clustering without eigenvector calculation. To find the optimal initial clustering, we coarsen the graph at multi-level with edge contraction method, and we apply spectral clustering (we can use any other graph clustering method) to the smallest graph. This is called *base clustering*. The clustering information of the previous graph will be the initial clustering of the next level graph. Using this approach, we can do graph cut without eigenvector calculation. Currently, Mahout has a spectral k-means clustering which needs eigenvector calculation using distributed Lancoz algorithm. When I tried this implementation with livejournal graph data, I'd waited for more than 15 hours but it didn't finish building matrix laplacian yet but my implementation was finished within 12 hours. I didn't implement base clustering. In my experiment, I used the legacy graph clustering package "Graclus" written in C but we can use Mahout spectral k-means clustering as the base clustering phase. I want to contribute my implementation to Mahout if it is available and allowed. Please let me know how I can follow up. Thank you Best, Jae
