-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hey guys.
Thank you for all the information about Singular Value Decomposition. The Lanczos algorithm seems to be a bad choice for small matrices. But the Stochastic SVD with k = full rank and p = 0 (thanks Dmitriy Lyubimov for implementing that) works fine. So far, Markus On 09/23/2011 08:46 PM, Dmitriy Lyubimov wrote: > I already fixed full rank (p =0) on the trunk. It was just an > invalid assertion, the algorithm isn't limiting that. So k=3 p=0 > should be ok now in the trunk. On Sep 23, 2011 8:34 PM, "Ted > Dunning" <ted.dunn...@gmail.com> wrote: >> Markus, >> >> Try testing on a 20x20 matrix if you want to use p>0. The issue >> is that this is an approximation algorithm that works for >> reasonably high > dimension. >> 3 is not reasonably high. 20 is probably marginal. >> >> On Fri, Sep 23, 2011 at 4:42 PM, Dmitriy Lyubimov >> <dlyubi...@apache.org wrote: >> >>> oh, ok, apparently you need to use p>0. >>> >>> but then there's a problem that ther's k+p >=m (input height) >>> requirement so I guess this is a corner case i did not account >>> for. >>> >>> you can use k=2 and p=1 and caveat is that even though 3 >>> singular values will be computed, only 2 of them will be saved. >>> this solver always assumes "thin" decomposition requirement\s, >>> although distinction is purely technical, it is only a matter a >>> patch to enable p=0. >>> >>> It is only a case because your input so small. In practice, >>> input is much "longer" than k+p rows so it hasn't come up as an >>> issue. Point is, it will not do full rank decomposition with >>> small matrices; but then, you don't want to use it with small >>> matrices :) >>> >>> alhough i can engineer a patch to allow p=0 and full rank >>> decompositions for short wide matrices if it is that >>> important. >>> >>> -dmitriy >>> >>> On Fri, Sep 23, 2011 at 3:42 PM, Markus Holtermann >>> <i...@markusholtermann.eu> wrote: >>>> Thank you for all your responses. >>>> >>>> ref. Dan Brickley: ------------------ hopefully you did dream >>>> ;-) >>>> >>>> ref. Dmitriy Lyubimov: ---------------------- When I run >>>> `mahout ssvd -i A.seq -o A-ssvd/ -k 3 -p 0` I get an >>>> IllegalArgumentException. You can find the traceback at >>>> http://paste.pocoo.org/show/481168/ . >>>> >>>> ref. Ted Dunning: ----------------- I am running the M/R >>>> version of SVD in local mode. I didn't install Hadoop except >>>> what is coming via `mvn install`. If I understand the code >>>> correctly, the `--inMemory` argument is only relevant for the >>>> "EigenVerificationJob" -- I didn't run that. >>>> >>>> Here are the latest results for the calculations as described >>>> in my previous mail: >>>> >>>> For 1: Key class: class org.apache.hadoop.io.IntWritable >>>> Value Class: class org.apache.mahout.math.VectorWritable Key: >>>> 0: Value: eigenVector0, eigenvalue = 11.344411508600611: >>>> {0:0.8940505788976013,1:0.05761556873901637,2:-0.44424543735613486} >>>> >>>> Key: 1: Value: eigenVector1, eigenvalue = 0.0: >>>> {0:-0.3030457633656634,1:0.8081220356417685,2:-0.5050762722761053} >>>> >>>> Key: 2: Value: eigenVector2, eigenvalue = -0.4362482432944815: >>>> {0:0.3299042704770375,1:0.5861904313011974,2:0.7399621277956934} >>>> >>>> Count: 3 >>>> >>>> For 2: Key class: class org.apache.hadoop.io.IntWritable >>>> Value Class: class org.apache.mahout.math.VectorWritable Key: >>>> 0: Value: eigenVector0, eigenvalue = 11.344814282762082: >>>> {0:0.7369762290995766,1:0.3279852776056837,2:-0.5910090485061045} >>>> >>>> Key: 1: Value: eigenVector1, eigenvalue = 0.17091518882717976: >>>> {0:0.9225878132457447,1:0.3812202473600341,2:0.05918487858557608} >>>> >>>> Key: 2: Value: eigenVector2, eigenvalue = 0.0: >>>> {0:-0.5910090485061055,1:0.7369762290995774,2:-0.3279852776056802} >>>> >>>> Key: 3: Value: eigenVector3, eigenvalue = >>>> >>> > -0.5157294715892533:{0:-0.32798527760568197,1:-0.5910090485061036,2:-0.7369762290995783} >>>> > Count: 4 >>>> >>>> For 3: Key class: class org.apache.hadoop.io.IntWritable >>>> Value Class: class org.apache.mahout.math.VectorWritable Key: >>>> 0: Value: eigenVector0, eigenvalue = 11.344814080004587: >>>> {0:0.2870124314018251,1:-0.8054865010309287,2:0.5184740696291035} >>>> >>>> Key: 1: Value: eigenVector1, eigenvalue = 0.4852290375835231: >>>> {0:0.9000472484774761,1:0.041469409433508436,2:-0.4338147514658307} >>>> >>>> Key: 2: Value: eigenVector2, eigenvalue = 0.0: >>>> {0:0.3279311127797073,1:0.5911613863727806,2:0.7368781449689461} >>>> >>>> Count: 3 >>>> >>>> For 4: Key class: class org.apache.hadoop.io.IntWritable >>>> Value Class: class org.apache.mahout.math.VectorWritable Key: >>>> 0: Value: eigenVector0, eigenvalue = 11.34481428276208: >>>> {0:0.788451139115581,1:0.5058848349238699,2:0.3498933194866569} >>>> >>>> Key: 1: Value: eigenVector1, eigenvalue = 0.5157294715892401: >>>> {0:-0.5910090485061453,1:0.7369762290995597,2:-0.32798527760564816} >>>> >>>> Key: 2: Value: eigenVector2, eigenvalue = 0.1709151888272022: >>>> {0:-0.7369762290995447,1:-0.3279852776057236,2:0.5910090485061223} >>>> >>>> Key: 3: Value: eigenVector3, eigenvalue = 0.0: >>>> {0:-0.3279852776056819,1:-0.5910090485061036,2:-0.7369762290995783} >>>> >>>> Count: 4 >>>> >>>> For 5: Key class: class org.apache.hadoop.io.IntWritable >>>> Value Class: class org.apache.mahout.math.VectorWritable Key: >>>> 0: Value: eigenVector0, eigenvalue = 7.7949818262315: >>>> {0:-0.3998289016610171,1:0.3486764982772797,2:0.8476800982361441} >>>> >>>> Key: 1: Value: eigenVector1, eigenvalue = 0.0: >>>> {0:0.3244428422615253,1:-0.8111071056538125,2:0.4866642633922878} >>>> >>>> Key: 2: Value: eigenVector2, eigenvalue = -2.2686660367578133: >>>> {0:0.8572477421969729,1:0.4696061783100697,2:0.21117846905213422} >>>> >>>> Count: 3 >>>> >>>> For 6: Key class: class org.apache.hadoop.io.IntWritable >>>> Value Class: class org.apache.mahout.math.VectorWritable Key: >>>> 0: Value: eigenVector0, eigenvalue = 9.903422603237882: >>>> {0:-0.305869782876591,1:-0.012493432384138303,2:0.9519913813004245} >>>> >>>> Key: 1: Value: eigenVector1, eigenvalue = 6.002722238353203: >>>> {0:-0.7781330995244824,1:0.06366543541563939,2:0.624864458709054} >>>> >>>> Key: 2: Value: eigenVector2, eigenvalue = 0.0: >>>> {0:0.2988138112963618,1:0.9481291552697455,2:0.10845003967736172} >>>> >>>> Key: 3: Value: eigenVector3, eigenvalue = -3.906144841591079: >>>> {0:0.9039656974142156,1:-0.3176397630567398,2:0.2862708487144453} >>>> >>>> Count: 4 >>>> >>>> For 7: Key class: class org.apache.hadoop.io.IntWritable >>>> Value Class: class org.apache.mahout.math.VectorWritable Key: >>>> 0: Value: eigenVector0, eigenvalue = 7.04924152040162: >>>> {0:-0.4082482904638631,1:0.8164965809277261,2:-0.4082482904638631} >>>> >>>> Key: 1: Value: eigenVector1, eigenvalue = 3.782617346103868: >>>> {0:0.7808892910047764,1:0.08072916428282848,2:-0.6194309624391194} >>>> >>>> Key: 2: Value: eigenVector2, eigenvalue = 0.0: >>>> {0:0.47280571964327067,1:0.5716783495703939,2:0.6705509794975171} >>>> >>>> Count: 3 >>>> >>>> For 8: Key class: class org.apache.hadoop.io.IntWritable >>>> Value Class: class org.apache.mahout.math.VectorWritable Key: >>>> 0: Value: eigenVector0, eigenvalue = 7.964450219004663: >>>> {0:NaN,1:NaN,2:NaN} Key: 1: Value: eigenVector1, eigenvalue = >>>> 7.000000000000002: {0:NaN,1:NaN,2:NaN} Key: 2: Value: >>>> eigenVector2, eigenvalue = 0.753347668076679: >>>> {0:NaN,1:NaN,2:NaN} Key: 3: Value: eigenVector3, eigenvalue = >>>> 0.0: {0:NaN,1:NaN,2:NaN} Count: 4 >>>> >>>> >>>> ref. Danny Bickson: ------------------- Thanks for your >>>> confirmation on how to use the rank. Regarding the scale >>>> factor and orthogonalization: Yes, I take it into account. >>>> I'm running SVD from trunk without any changes. And even >>>> after commenting out those parts of the code, the results are >>>> still wrong in the cases 1, 2, 3, 7 and 8 >>>> >>>> Thank you for your help. >>>> >>>> Markus >>>> >>>> >>>>> On 22 Sep 2011, at 18:37, Markus Holtermann >>>>> <i...@markusholtermann.eu> wrote: >>>>> >>>>>> Hello there, >>>>>> >>>>>> I'm trying to run Mahout's Singular Value Decomposition >>>>>> but realized, that the resulting eigenvalues are wrong in >>>>>> most cases. So I took two small 3x3 matrices and >>>>>> calculated their eigenvalues and eigenvectors by hand and >>>>>> compared the results to Mahout. >>>>>> >>>>>> Only in one of eight cases the results for Mahout and my >>>>>> pen & paper matched. >>>>>> >>>>>> Lets take A = {{1,2,3},{2,4,5},{3,5,6}} and B = >>>>>> {{5,2,4},{-3,6,2},{3,-3,1}} >>>>>> >>>>>> As you can see, A is symmetric, B is not. >>>>>> >>>>>> I ran `mahout svd --output out/ --numRows 3 --numCols 3` >>>>>> eight times with different arguments: >>>>>> >>>>>> 1) --input A --rank 3 --symmetric true result is wrong >>>>>> 2) --input A --rank 4 --symmetric true result is wrong 3) >>>>>> --input A --rank 3 --symmetric false result is wrong 4) >>>>>> --input A --rank 4 --symmetric false result is CORRECT >>>>>> >>>>>> 5) --input B --rank 3 --symmetric true result is wrong >>>>>> 6) --input B --rank 4 --symmetric true result is wrong 7) >>>>>> --input B --rank 3 --symmetric false result is wrong 8) >>>>>> --input B --rank 4 --symmetric false result is wrong >>>>>> >>>>>> To verify that my input data is correct, this is the >>>>>> result of `mahout seqdumper` >>>>>> >>>>>> For A: Key class: class org.apache.hadoop.io.IntWritable >>>>>> Value Class: class org.apache.mahout.math.VectorWritable >>>>>> Key: 0: Value: {0:1.0,1:2.0,2:3.0} Key: 1: Value: >>>>>> {0:2.0,1:4.0,2:5.0} Key: 2: Value: {0:3.0,1:5.0,2:6.0} >>>>>> Count: 3 >>>>>> >>>>>> >>>>>> For B: Key class: class org.apache.hadoop.io.IntWritable >>>>>> Value Class: class org.apache.mahout.math.VectorWritable >>>>>> Key: 0: Value: {0:5.0,1:2.0,2:4.0} Key: 1: Value: >>>>>> {0:-3.0,1:6.0,2:2.0} Key: 2: Value: {0:3.0,1:-3.0,2:1.0} >>>>>> Count: 3 >>>>>> >>>>>> >>>>>> And finally, the correct eigenvalues should be: For A: λ1 >>>>>> = 11.3448 λ2 = -0.515729 λ3 = 0.170915 >>>>>> >>>>>> For B: λ1 = 7 λ2 = 3 λ3 = 2 >>>>>> >>>>>> So, are there any known bugs in Mahout's SVD >>>>>> implementation? Am I doing something wrong? Is this >>>>>> algorithm known to produce wrong results? >>>>>> >>>>>> Thanks in advance. >>>>>> >>>>>> Markus >>>> >>>> >>> > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk6DhEIACgkQA8JzLzUe2LNSHwCgpc/ZgUXPaq0aNwrbcPGH4AXB MVgAnjrgbceGHNHcHheCPPGydoAvcr57 =DBHE -----END PGP SIGNATURE-----