Dear Mahout developers,

I'm a Computer Science student from the National University of Distance 
Education in Spain. I'm currently developing my final year project which is 
about Diffusion Maps.

This method is used for dimensionality reduction and it uses the Lanczos 
algorithm during its operations. The method is already implemented in the last 
release version 
of Mahout in the LanczosSolver class but we foresee the need to use the 
algorithm with distributed calculations. This implementation of Diffusion Maps 
has to deal 
with extremely large matrices and the distributed calculation is critical for 
me.

I have noticed that there is a DistributedLanzcosSolver class implemented in 
the Mahout library but I can’t have access to the source code because it isn't 
in the 
last release version of Mahout.


Could you please let me know if I could have access to the source code of this 
class?Also I would like to ask you about how the LanczosSolver implementation 
works. I have made some test between this class and other 
program which has been implemented in R. This program is using a library called 
Arpack, which also uses the Lanczos algorithm. When I calculate the eigenvalues 
and the eigenvectors of a symmetric matrix. I haven’t the same results. For 
example:
For this matrix:


4.42282138 1.51744077 0.07690571 0.93650042 2.19609401 
1.51744077 1.73849477 -0.11856149 0.76555191 1.3673608 
0.07690571 -0.11856149 0.55065932 1.72163263 -0.2283693 
0.93650042 0.76555191 1.72163263 0.09470345 -1.16626194 
2.19609401 1.3673608 -0.2283693 -1.16626194 -0.37321311 
Results for R:


Eigenvalues

 -0.6442398  1.1084103  2.3946915  6.2018925

Eigenvectors            [,1]        [,2]         [,3]       [,4]

[1,] -0.17050824  0.46631043 -0.010360993 0.83660453
[2,] -0.06455473 -0.87762807 -0.008814402 0.40939079
[3,]  0.68602882  0.04706265 -0.666429293 0.02602181
[4,] -0.39567054 -0.07491643 -0.670834157 0.12161492
[5,]  0.58272541 -0.06705358  0.325066897 0.34208875


Results for Java:


Eigenvalues

0.0 0.007869004183962289 0.023293016691817894 0.10872358093523908 
0.13087002850143611 
I never get the same eigenvalues, I think this is because the documentation of 
the class says:
To avoid floating point overflow problems which arise in power-methods like 
Lanczos, an initial pass is made through the input matrix to generate a good 
starting seed vector by summing all the rows of the input matrix, and compute 
the trace(inputMatrixt*matrix)
This latter value, being the sum of all of the singular values, is used to 
rescale the entire matrix, effectively forcing the largest singular value to be 
strictly 
less than one, and transforming floating point overflow problems into floating 
point underflow (ie, very small singular values will become invisible, as they 
will 
appear to be zero and the algorithm will terminate). 
Is it possible to return the eigenvalues to theirs original value?
Eigenvectors

-0.83660453 0.23122937 0.010360993   0.46631043 -0.17050824
-0.40939079 0.24067227 0.008814402  -0.87762807 -0.06455473 
-0.02602181 0.28695718 0.666429293   0.04706265 0.68602882 
-0.12161492 -0.61075665 0.670834157  -0.07491643 -0.39567054 
-0.34208875 -0.65821099 -0.325066897  -0.06705358 0.58272541 



Always happens the same. I have to force the calculation of N vectors (with an 
N x N matrix) to obtain the same values for the eigenvectors, 
except in the sign of some of the values, which is acceptable. I thought this 
implementation of the algorithm should return the eigenvectors sorted but all 
the time I’m obtaining a vector which I don’t want to calculate between them. 
In the example above it’s the second one starting from the left.Why is this 
happen?

Thanks in advance.

K.r.Pedro                                         

Reply via email to