On Tue, Jul 31, 2012 at 6:00 PM, Aniruddha Basak <[email protected]>wrote:
> Hi,**** > > I am working on Spectral Kmeans which involves an eigen-decomposition step > **** > > using Lanczos. As I did not get exact similar results as expected, I tried > to understand the**** > > implementation.**** > > I have one question about “LanczosSolver .java” :**** > > In the “solve” method, while building the “tridiagonal matrix” there is a > step just**** > > after the multiplication job (performed on Hadoop as TimesSquaredJob) as** > ** > > ------------**** > > nextVector.assign(*new* Scale(1.0 / state.getScaleFactor()));**** > > -----------**** > > I could not understand why this scaling is performed? > It's to help with floating point overflow issues in naive Lanczos implementations: each time you multiply the basis vectors by the matrix^2, it grows roughly like the largest singular of the matrix (squared). Do this a fifty or sixty times and you blow over 10^308 and you're screwed. Scaling the basis vectors down after each iteration is equivalent to normalizing the input matrix to have smaller singular values (uniformly), and turns floating point overflow problems into underflow problems. Thankfully, when you have entries in your basis vectors which are close to zero, it's ok to have them be *actually* zero. Unless you do it to all of them, naturally. Either way, uniform scaling doesn't change the output singular vectors, so the result is unchanged if you don't end up running into floating point precision issues. > **** > > **(When I compared the results on a small matrix to an equivalent Matlab > script, > > ** > > I found the results are exactly similar WITHOUT this scaling. Including > this scaling**** > > makes the results different from the Matlab results. )**** > > ** ** > > ** ** > > Thanks,**** > > Aniruddha**** > > ** ** > -- -jake
