Hi Pedro,

Which version of Mahout are you working with? I think
DistributedLanczosSolver is available from from version 0.4. I have more or
less the same doubts about the results as you have, so i'm willing to read
the answers to these questions :)


Best.
Fernando.

2010/11/19 PEDRO MANUEL JIMENEZ RODRIGUEZ <[email protected]>

>
> Dear Mahout developers,
>
> I'm a Computer Science student from the National University of Distance
> Education in Spain. I'm currently developing my final year project which is
> about Diffusion Maps.
>
> This method is used for dimensionality reduction and it uses the Lanczos
> algorithm during its operations. The method is already implemented in the
> last release version
> of Mahout in the LanczosSolver class but we foresee the need to use the
> algorithm with distributed calculations. This implementation of Diffusion
> Maps has to deal
> with extremely large matrices and the distributed calculation is critical
> for me.
>
> I have noticed that there is a DistributedLanzcosSolver class implemented
> in the Mahout library but I can’t have access to the source code because it
> isn't in the
> last release version of Mahout.
>
>
> Could you please let me know if I could have access to the source code of
> this class?Also I would like to ask you about how the LanczosSolver
> implementation works. I have made some test between this class and other
> program which has been implemented in R. This program is using a library
> called Arpack, which also uses the Lanczos algorithm. When I calculate the
> eigenvalues and the eigenvectors of a symmetric matrix. I haven’t the same
> results. For example:
> For this matrix:
>
>
> 4.42282138 1.51744077 0.07690571 0.93650042 2.19609401
> 1.51744077 1.73849477 -0.11856149 0.76555191 1.3673608
> 0.07690571 -0.11856149 0.55065932 1.72163263 -0.2283693
> 0.93650042 0.76555191 1.72163263 0.09470345 -1.16626194
> 2.19609401 1.3673608 -0.2283693 -1.16626194 -0.37321311
> Results for R:
>
>
> Eigenvalues
>
>  -0.6442398  1.1084103  2.3946915  6.2018925
>
> Eigenvectors            [,1]        [,2]         [,3]       [,4]
>
> [1,] -0.17050824  0.46631043 -0.010360993 0.83660453
> [2,] -0.06455473 -0.87762807 -0.008814402 0.40939079
> [3,]  0.68602882  0.04706265 -0.666429293 0.02602181
> [4,] -0.39567054 -0.07491643 -0.670834157 0.12161492
> [5,]  0.58272541 -0.06705358  0.325066897 0.34208875
>
>
> Results for Java:
>
>
> Eigenvalues
>
> 0.0 0.007869004183962289 0.023293016691817894 0.10872358093523908
> 0.13087002850143611
> I never get the same eigenvalues, I think this is because the documentation
> of the class says:
> To avoid floating point overflow problems which arise in power-methods like
> Lanczos, an initial pass is made through the input matrix to generate a good
> starting seed vector by summing all the rows of the input matrix, and
> compute the trace(inputMatrixt*matrix)
> This latter value, being the sum of all of the singular values, is used to
> rescale the entire matrix, effectively forcing the largest singular value to
> be strictly
> less than one, and transforming floating point overflow problems into
> floating point underflow (ie, very small singular values will become
> invisible, as they will
> appear to be zero and the algorithm will terminate).
> Is it possible to return the eigenvalues to theirs original value?
> Eigenvectors
>
> -0.83660453 0.23122937 0.010360993   0.46631043 -0.17050824
> -0.40939079 0.24067227 0.008814402  -0.87762807 -0.06455473
> -0.02602181 0.28695718 0.666429293   0.04706265 0.68602882
> -0.12161492 -0.61075665 0.670834157  -0.07491643 -0.39567054
> -0.34208875 -0.65821099 -0.325066897  -0.06705358 0.58272541
>
>
>
> Always happens the same. I have to force the calculation of N vectors (with
> an N x N matrix) to obtain the same values for the eigenvectors,
> except in the sign of some of the values, which is acceptable. I thought
> this implementation of the algorithm should return the eigenvectors sorted
> but all the time I’m obtaining a vector which I don’t want to calculate
> between them.
> In the example above it’s the second one starting from the left.Why is this
> happen?
>
> Thanks in advance.
>
> K.r.Pedro

Reply via email to