Hi Markus!
1) Regarding rank, I observed here:
http://bickson.blogspot.com/2011/02/some-thoughts-about-accuracy-of-mahouts.html
that you need to request rank+1 to get the desired rank. So your runs with
--rank 4 are the correct ones.
2) There are two transformations which makes comparison of results with
matlab (or pen & paper) harder:

a) The scaleFactor. Defined in:
./math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosState.java
I quote a documentation remark in:
./math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosSolver.java:48

*" * To avoid floating point overflow problems which arise in power-methods
like Lanczos, an initial pass is made
 * through the input matrix to
 *   <li>generate a good starting seed vector by summing all the rows of the
input matrix, and</li>
 *   <li>compute the trace(inputMatrix<sup>t</sup>*matrix)
 * This latter value, being the sum of all of the singular values, is used
to rescale the entire matrix, effectively
 * forcing the largest singular value to be strictly less than one, and
transforming floating point <em>overflow</em>
 * problems into floating point <em>underflow</em> (ie, very small singular
values will become invisible, as they
 * will appear to be zero and the algorithm will terminate).*

Now - did you take the scale factor into account in your comparison? If not,
you will surely get different results.

b) The second transformation is orthonogolization of the resulting vector.
This step is optional (IMHO).
see:
./math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosSolver.java:118
The function call is: orthoganalizeAgainstAllButLast(nextVector, state);
Again I quote from documentation:
** <p>This implementation uses {@link
org.apache.mahout.math.matrix.linalg.EigenvalueDecomposition} to do the
 * eigenvalue extraction from the small (desiredRank x desiredRank)
tridiagonal matrix.  Numerical stability is
 * achieved via brute-force: re-orthogonalization against all previous
eigenvectors is computed after every pass.
 * This can be made smarter if (when!) this proves to be a major
bottleneck.  Of course, this step can be parallelized
 * as well.
 * </p>*


Did you take orthogonalization into account when comparing? Matlab eig()
command does not perform this step
as far as I recall.

Let me know if you have further questions.

Best,

Danny Bickson


On Fri, Sep 23, 2011 at 4:37 AM, Markus Holtermann <i...@markusholtermann.eu
> wrote:

> Hello there,
>
> I'm trying to run Mahout's Singular Value Decomposition but realized,
> that the resulting eigenvalues are wrong in most cases. So I took two
> small 3x3 matrices and calculated their eigenvalues and eigenvectors by
> hand and compared the results to Mahout.
>
> Only in one of eight cases the results for Mahout and my pen & paper
> matched.
>
> Lets take
>    A = {{1,2,3},{2,4,5},{3,5,6}}
> and
>    B = {{5,2,4},{-3,6,2},{3,-3,1}}
>
> As you can see, A is symmetric, B is not.
>
> I ran `mahout svd --output out/ --numRows 3 --numCols 3` eight times
> with different arguments:
>
> 1) --input A --rank 3 --symmetric true    result is wrong
> 2) --input A --rank 4 --symmetric true    result is wrong
> 3) --input A --rank 3 --symmetric false   result is wrong
> 4) --input A --rank 4 --symmetric false   result is CORRECT
>
> 5) --input B --rank 3 --symmetric true    result is wrong
> 6) --input B --rank 4 --symmetric true    result is wrong
> 7) --input B --rank 3 --symmetric false   result is wrong
> 8) --input B --rank 4 --symmetric false   result is wrong
>
> To verify that my input data is correct, this is the result of `mahout
> seqdumper`
>
> For A:
> Key class: class org.apache.hadoop.io.IntWritable
> Value Class: class org.apache.mahout.math.VectorWritable
> Key: 0: Value: {0:1.0,1:2.0,2:3.0}
> Key: 1: Value: {0:2.0,1:4.0,2:5.0}
> Key: 2: Value: {0:3.0,1:5.0,2:6.0}
> Count: 3
>
>
> For B:
> Key class: class org.apache.hadoop.io.IntWritable
> Value Class: class org.apache.mahout.math.VectorWritable
> Key: 0: Value: {0:5.0,1:2.0,2:4.0}
> Key: 1: Value: {0:-3.0,1:6.0,2:2.0}
> Key: 2: Value: {0:3.0,1:-3.0,2:1.0}
> Count: 3
>
>
> And finally, the correct eigenvalues should be:
> For A:
> λ1 = 11.3448
> λ2 = -0.515729
> λ3 = 0.170915
>
> For B:
> λ1 = 7
> λ2 = 3
> λ3 = 2
>
> So, are there any known bugs in Mahout's SVD implementation? Am I doing
> something wrong? Is this algorithm known to produce wrong results?
>
> Thanks in advance.
>
> Markus
>

Reply via email to