Hi Markus! 1) Regarding rank, I observed here: http://bickson.blogspot.com/2011/02/some-thoughts-about-accuracy-of-mahouts.html that you need to request rank+1 to get the desired rank. So your runs with --rank 4 are the correct ones. 2) There are two transformations which makes comparison of results with matlab (or pen & paper) harder:
a) The scaleFactor. Defined in: ./math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosState.java I quote a documentation remark in: ./math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosSolver.java:48 *" * To avoid floating point overflow problems which arise in power-methods like Lanczos, an initial pass is made * through the input matrix to * <li>generate a good starting seed vector by summing all the rows of the input matrix, and</li> * <li>compute the trace(inputMatrix<sup>t</sup>*matrix) * This latter value, being the sum of all of the singular values, is used to rescale the entire matrix, effectively * forcing the largest singular value to be strictly less than one, and transforming floating point <em>overflow</em> * problems into floating point <em>underflow</em> (ie, very small singular values will become invisible, as they * will appear to be zero and the algorithm will terminate).* Now - did you take the scale factor into account in your comparison? If not, you will surely get different results. b) The second transformation is orthonogolization of the resulting vector. This step is optional (IMHO). see: ./math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosSolver.java:118 The function call is: orthoganalizeAgainstAllButLast(nextVector, state); Again I quote from documentation: ** <p>This implementation uses {@link org.apache.mahout.math.matrix.linalg.EigenvalueDecomposition} to do the * eigenvalue extraction from the small (desiredRank x desiredRank) tridiagonal matrix. Numerical stability is * achieved via brute-force: re-orthogonalization against all previous eigenvectors is computed after every pass. * This can be made smarter if (when!) this proves to be a major bottleneck. Of course, this step can be parallelized * as well. * </p>* Did you take orthogonalization into account when comparing? Matlab eig() command does not perform this step as far as I recall. Let me know if you have further questions. Best, Danny Bickson On Fri, Sep 23, 2011 at 4:37 AM, Markus Holtermann <i...@markusholtermann.eu > wrote: > Hello there, > > I'm trying to run Mahout's Singular Value Decomposition but realized, > that the resulting eigenvalues are wrong in most cases. So I took two > small 3x3 matrices and calculated their eigenvalues and eigenvectors by > hand and compared the results to Mahout. > > Only in one of eight cases the results for Mahout and my pen & paper > matched. > > Lets take > A = {{1,2,3},{2,4,5},{3,5,6}} > and > B = {{5,2,4},{-3,6,2},{3,-3,1}} > > As you can see, A is symmetric, B is not. > > I ran `mahout svd --output out/ --numRows 3 --numCols 3` eight times > with different arguments: > > 1) --input A --rank 3 --symmetric true result is wrong > 2) --input A --rank 4 --symmetric true result is wrong > 3) --input A --rank 3 --symmetric false result is wrong > 4) --input A --rank 4 --symmetric false result is CORRECT > > 5) --input B --rank 3 --symmetric true result is wrong > 6) --input B --rank 4 --symmetric true result is wrong > 7) --input B --rank 3 --symmetric false result is wrong > 8) --input B --rank 4 --symmetric false result is wrong > > To verify that my input data is correct, this is the result of `mahout > seqdumper` > > For A: > Key class: class org.apache.hadoop.io.IntWritable > Value Class: class org.apache.mahout.math.VectorWritable > Key: 0: Value: {0:1.0,1:2.0,2:3.0} > Key: 1: Value: {0:2.0,1:4.0,2:5.0} > Key: 2: Value: {0:3.0,1:5.0,2:6.0} > Count: 3 > > > For B: > Key class: class org.apache.hadoop.io.IntWritable > Value Class: class org.apache.mahout.math.VectorWritable > Key: 0: Value: {0:5.0,1:2.0,2:4.0} > Key: 1: Value: {0:-3.0,1:6.0,2:2.0} > Key: 2: Value: {0:3.0,1:-3.0,2:1.0} > Count: 3 > > > And finally, the correct eigenvalues should be: > For A: > λ1 = 11.3448 > λ2 = -0.515729 > λ3 = 0.170915 > > For B: > λ1 = 7 > λ2 = 3 > λ3 = 2 > > So, are there any known bugs in Mahout's SVD implementation? Am I doing > something wrong? Is this algorithm known to produce wrong results? > > Thanks in advance. > > Markus >