Hi, First of all, thanks for your replies.
Fernando, I'm using 0.4 version but I couldn't find the source code. I tried to get to the .java of this package: org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver But I couln't find the file. Ted, In this project I would have to work with matrix of 10^9, which have a very sparse data. I'm just in the begining, trying to see how can make this work, so I don't have a real example of data to calculate. I'm just checking how the LanczosSolver works because the amount of data that I would have to manage is extremely big. I would need something like the DistributedLanzcosSolver to calculate the eigenValues and eigenVectors. Which is the data I need to work in Diffusion Maps. K.R. Pedro. > From: [email protected] > Date: Fri, 19 Nov 2010 08:08:31 -0800 > Subject: Re: Lanczos Algorithm > To: [email protected] > > First, the Lanczos decomposition in Mahout is pretty new. It has passed > some tests, but has a very reasonable probability of significant bugs, > especially if you are giving it a matrix that does not have a strongly > decreasing sequence of singular values. > > Remember also that the singular values are not necessarily returned in the > order you might expect. > > Can you say a bit more about your data? How large? How sparse? > > This may also be of interest to you: > https://issues.apache.org/jira/browse/MAHOUT-376 > > I see this stochastic decomposition as the way forward to larger > decomposition than can be done with the Lanczos solver. > > 2010/11/19 Fernando Fernández <[email protected]> > > > Hi Pedro, > > > > Which version of Mahout are you working with? I think > > DistributedLanczosSolver is available from from version 0.4. I have more or > > less the same doubts about the results as you have, so i'm willing to read > > the answers to these questions :) > > > > > > Best. > > Fernando. > > > > 2010/11/19 PEDRO MANUEL JIMENEZ RODRIGUEZ <[email protected]> > > > > > > > > Dear Mahout developers, > > > > > > I'm a Computer Science student from the National University of Distance > > > Education in Spain. I'm currently developing my final year project which > > is > > > about Diffusion Maps. > > > > > > This method is used for dimensionality reduction and it uses the Lanczos > > > algorithm during its operations. The method is already implemented in the > > > last release version > > > of Mahout in the LanczosSolver class but we foresee the need to use the > > > algorithm with distributed calculations. This implementation of Diffusion > > > Maps has to deal > > > with extremely large matrices and the distributed calculation is critical > > > for me. > > > > > > I have noticed that there is a DistributedLanzcosSolver class implemented > > > in the Mahout library but I can’t have access to the source code because > > it > > > isn't in the > > > last release version of Mahout. > > > > > > > > > Could you please let me know if I could have access to the source code of > > > this class?Also I would like to ask you about how the LanczosSolver > > > implementation works. I have made some test between this class and other > > > program which has been implemented in R. This program is using a library > > > called Arpack, which also uses the Lanczos algorithm. When I calculate > > the > > > eigenvalues and the eigenvectors of a symmetric matrix. I haven’t the > > same > > > results. For example: > > > For this matrix: > > > > > > > > > 4.42282138 1.51744077 0.07690571 0.93650042 2.19609401 > > > 1.51744077 1.73849477 -0.11856149 0.76555191 1.3673608 > > > 0.07690571 -0.11856149 0.55065932 1.72163263 -0.2283693 > > > 0.93650042 0.76555191 1.72163263 0.09470345 -1.16626194 > > > 2.19609401 1.3673608 -0.2283693 -1.16626194 -0.37321311 > > > Results for R: > > > > > > > > > Eigenvalues > > > > > > -0.6442398 1.1084103 2.3946915 6.2018925 > > > > > > Eigenvectors [,1] [,2] [,3] [,4] > > > > > > [1,] -0.17050824 0.46631043 -0.010360993 0.83660453 > > > [2,] -0.06455473 -0.87762807 -0.008814402 0.40939079 > > > [3,] 0.68602882 0.04706265 -0.666429293 0.02602181 > > > [4,] -0.39567054 -0.07491643 -0.670834157 0.12161492 > > > [5,] 0.58272541 -0.06705358 0.325066897 0.34208875 > > > > > > > > > Results for Java: > > > > > > > > > Eigenvalues > > > > > > 0.0 0.007869004183962289 0.023293016691817894 0.10872358093523908 > > > 0.13087002850143611 > > > I never get the same eigenvalues, I think this is because the > > documentation > > > of the class says: > > > To avoid floating point overflow problems which arise in power-methods > > like > > > Lanczos, an initial pass is made through the input matrix to generate a > > good > > > starting seed vector by summing all the rows of the input matrix, and > > > compute the trace(inputMatrixt*matrix) > > > This latter value, being the sum of all of the singular values, is used > > to > > > rescale the entire matrix, effectively forcing the largest singular value > > to > > > be strictly > > > less than one, and transforming floating point overflow problems into > > > floating point underflow (ie, very small singular values will become > > > invisible, as they will > > > appear to be zero and the algorithm will terminate). > > > Is it possible to return the eigenvalues to theirs original value? > > > Eigenvectors > > > > > > -0.83660453 0.23122937 0.010360993 0.46631043 -0.17050824 > > > -0.40939079 0.24067227 0.008814402 -0.87762807 -0.06455473 > > > -0.02602181 0.28695718 0.666429293 0.04706265 0.68602882 > > > -0.12161492 -0.61075665 0.670834157 -0.07491643 -0.39567054 > > > -0.34208875 -0.65821099 -0.325066897 -0.06705358 0.58272541 > > > > > > > > > > > > Always happens the same. I have to force the calculation of N vectors > > (with > > > an N x N matrix) to obtain the same values for the eigenvectors, > > > except in the sign of some of the values, which is acceptable. I thought > > > this implementation of the algorithm should return the eigenvectors > > sorted > > > but all the time I’m obtaining a vector which I don’t want to calculate > > > between them. > > > In the example above it’s the second one starting from the left.Why is > > this > > > happen? > > > > > > Thanks in advance. > > > > > > K.r.Pedro > >
