If one wanted the *smallest* eigenvectors in a (potentially rather large but sparse) matrix, what are the options in Mahout currently? Lanczos? ssvd?
Specifically the eigenvector corresponding to the second-smallest eigenvalue... (there's always a boring zero one) of laplacian re-representation of affinity matrix, in my case. Reading around, http://www.sfu.ca/personal/archives/richards/Pages/NAS.AJS-WDR.pdf "While the Laplacian spectrum has many attractive theoretical properties which make it very useful for partitioning large sparse arrays, it has at least one drawback: the Fiedler eigenvector is associated with the second smallest eigenvalue. The current method of choice for finding a few eigenvectors of very large sparse matrices is the Lanczos method, which tends not to find small eigenvectors as quickly as it finds the largest eigenvectors (Pothen, et al, 1990). " That paper also suggests a workaround for this, "The definition of L shows that there is no loss of sparsity (except for the diagonal) and that the sparse methods mentioned earlier can be applied to find all or some of the eigenpairs. The requirement that we must find the smallest eigenpairs is easily overcome by subtracting a suitably large constant from the diagonal of -L (which subtracts that constant from the eigenvalues without hanging the eigenvectors).This guarantees that the first eigenpairs returned by the Power Method or Lanczos iteration are associated with the smallest eigenvalues of L." There's a thesis at http://am.vsb.cz/theses/kabelikova_ing.pdf that seems to try this (or a related) method, but I didn't dig into the details. If I understand the literature correctly (I've been poking around spectral graph theory e.g. see http://www.cs.purdue.edu/homes/dgleich/demos/matlab/spectral/spectral.html for a fairly accessible matlab example), this second-smallest (aka fiedler) eigenvector can be very interesting for finding structure in graph datasets, even if intuitively we're drawn more towards the larger eigenvectors for explaining bulk variation in the data. This is all closely related to Mahout's spectral clustering component, but from talking with Shannon I don't believe we currently do anything to find the smallest eigenvectors there. Are Lanczos and SSVD Mahout's two natural starting points for this kind of thing, or am I forgetting some other machinery? Thanks for any pointers or fixes to my understanding, Dan
