Markus, Try testing on a 20x20 matrix if you want to use p>0. The issue is that this is an approximation algorithm that works for reasonably high dimension. 3 is not reasonably high. 20 is probably marginal.
On Fri, Sep 23, 2011 at 4:42 PM, Dmitriy Lyubimov <dlyubi...@apache.org>wrote: > oh, ok, apparently you need to use p>0. > > but then there's a problem that ther's k+p >=m (input height) > requirement so I guess this is a corner case i did not account for. > > you can use k=2 and p=1 and caveat is that even though 3 singular > values will be computed, only 2 of them will be saved. this solver > always assumes "thin" decomposition requirement\s, although > distinction is purely technical, it is only a matter a patch to enable > p=0. > > It is only a case because your input so small. In practice, input is > much "longer" than k+p rows so it hasn't come up as an issue. Point > is, it will not do full rank decomposition with small matrices; but > then, you don't want to use it with small matrices :) > > alhough i can engineer a patch to allow p=0 and full rank > decompositions for short wide matrices if it is that important. > > -dmitriy > > On Fri, Sep 23, 2011 at 3:42 PM, Markus Holtermann > <i...@markusholtermann.eu> wrote: > > Thank you for all your responses. > > > > ref. Dan Brickley: > > ------------------ > > hopefully you did dream ;-) > > > > ref. Dmitriy Lyubimov: > > ---------------------- > > When I run `mahout ssvd -i A.seq -o A-ssvd/ -k 3 -p 0` I get an > > IllegalArgumentException. You can find the traceback at > > http://paste.pocoo.org/show/481168/ . > > > > ref. Ted Dunning: > > ----------------- > > I am running the M/R version of SVD in local mode. I didn't install > > Hadoop except what is coming via `mvn install`. > > If I understand the code correctly, the `--inMemory` argument is only > > relevant for the "EigenVerificationJob" -- I didn't run that. > > > > Here are the latest results for the calculations as described in my > > previous mail: > > > > For 1: > > Key class: class org.apache.hadoop.io.IntWritable > > Value Class: class org.apache.mahout.math.VectorWritable > > Key: 0: Value: eigenVector0, eigenvalue = 11.344411508600611: > > {0:0.8940505788976013,1:0.05761556873901637,2:-0.44424543735613486} > > Key: 1: Value: eigenVector1, eigenvalue = 0.0: > > {0:-0.3030457633656634,1:0.8081220356417685,2:-0.5050762722761053} > > Key: 2: Value: eigenVector2, eigenvalue = -0.4362482432944815: > > {0:0.3299042704770375,1:0.5861904313011974,2:0.7399621277956934} > > Count: 3 > > > > For 2: > > Key class: class org.apache.hadoop.io.IntWritable > > Value Class: class org.apache.mahout.math.VectorWritable > > Key: 0: Value: eigenVector0, eigenvalue = 11.344814282762082: > > {0:0.7369762290995766,1:0.3279852776056837,2:-0.5910090485061045} > > Key: 1: Value: eigenVector1, eigenvalue = 0.17091518882717976: > > {0:0.9225878132457447,1:0.3812202473600341,2:0.05918487858557608} > > Key: 2: Value: eigenVector2, eigenvalue = 0.0: > > {0:-0.5910090485061055,1:0.7369762290995774,2:-0.3279852776056802} > > Key: 3: Value: eigenVector3, eigenvalue = > > > -0.5157294715892533:{0:-0.32798527760568197,1:-0.5910090485061036,2:-0.7369762290995783} > > Count: 4 > > > > For 3: > > Key class: class org.apache.hadoop.io.IntWritable > > Value Class: class org.apache.mahout.math.VectorWritable > > Key: 0: Value: eigenVector0, eigenvalue = 11.344814080004587: > > {0:0.2870124314018251,1:-0.8054865010309287,2:0.5184740696291035} > > Key: 1: Value: eigenVector1, eigenvalue = 0.4852290375835231: > > {0:0.9000472484774761,1:0.041469409433508436,2:-0.4338147514658307} > > Key: 2: Value: eigenVector2, eigenvalue = 0.0: > > {0:0.3279311127797073,1:0.5911613863727806,2:0.7368781449689461} > > Count: 3 > > > > For 4: > > Key class: class org.apache.hadoop.io.IntWritable > > Value Class: class org.apache.mahout.math.VectorWritable > > Key: 0: Value: eigenVector0, eigenvalue = 11.34481428276208: > > {0:0.788451139115581,1:0.5058848349238699,2:0.3498933194866569} > > Key: 1: Value: eigenVector1, eigenvalue = 0.5157294715892401: > > {0:-0.5910090485061453,1:0.7369762290995597,2:-0.32798527760564816} > > Key: 2: Value: eigenVector2, eigenvalue = 0.1709151888272022: > > {0:-0.7369762290995447,1:-0.3279852776057236,2:0.5910090485061223} > > Key: 3: Value: eigenVector3, eigenvalue = 0.0: > > {0:-0.3279852776056819,1:-0.5910090485061036,2:-0.7369762290995783} > > Count: 4 > > > > For 5: > > Key class: class org.apache.hadoop.io.IntWritable > > Value Class: class org.apache.mahout.math.VectorWritable > > Key: 0: Value: eigenVector0, eigenvalue = 7.7949818262315: > > {0:-0.3998289016610171,1:0.3486764982772797,2:0.8476800982361441} > > Key: 1: Value: eigenVector1, eigenvalue = 0.0: > > {0:0.3244428422615253,1:-0.8111071056538125,2:0.4866642633922878} > > Key: 2: Value: eigenVector2, eigenvalue = -2.2686660367578133: > > {0:0.8572477421969729,1:0.4696061783100697,2:0.21117846905213422} > > Count: 3 > > > > For 6: > > Key class: class org.apache.hadoop.io.IntWritable > > Value Class: class org.apache.mahout.math.VectorWritable > > Key: 0: Value: eigenVector0, eigenvalue = 9.903422603237882: > > {0:-0.305869782876591,1:-0.012493432384138303,2:0.9519913813004245} > > Key: 1: Value: eigenVector1, eigenvalue = 6.002722238353203: > > {0:-0.7781330995244824,1:0.06366543541563939,2:0.624864458709054} > > Key: 2: Value: eigenVector2, eigenvalue = 0.0: > > {0:0.2988138112963618,1:0.9481291552697455,2:0.10845003967736172} > > Key: 3: Value: eigenVector3, eigenvalue = -3.906144841591079: > > {0:0.9039656974142156,1:-0.3176397630567398,2:0.2862708487144453} > > Count: 4 > > > > For 7: > > Key class: class org.apache.hadoop.io.IntWritable > > Value Class: class org.apache.mahout.math.VectorWritable > > Key: 0: Value: eigenVector0, eigenvalue = 7.04924152040162: > > {0:-0.4082482904638631,1:0.8164965809277261,2:-0.4082482904638631} > > Key: 1: Value: eigenVector1, eigenvalue = 3.782617346103868: > > {0:0.7808892910047764,1:0.08072916428282848,2:-0.6194309624391194} > > Key: 2: Value: eigenVector2, eigenvalue = 0.0: > > {0:0.47280571964327067,1:0.5716783495703939,2:0.6705509794975171} > > Count: 3 > > > > For 8: > > Key class: class org.apache.hadoop.io.IntWritable > > Value Class: class org.apache.mahout.math.VectorWritable > > Key: 0: Value: eigenVector0, eigenvalue = 7.964450219004663: > > {0:NaN,1:NaN,2:NaN} > > Key: 1: Value: eigenVector1, eigenvalue = 7.000000000000002: > > {0:NaN,1:NaN,2:NaN} > > Key: 2: Value: eigenVector2, eigenvalue = 0.753347668076679: > > {0:NaN,1:NaN,2:NaN} > > Key: 3: Value: eigenVector3, eigenvalue = 0.0: > > {0:NaN,1:NaN,2:NaN} > > Count: 4 > > > > > > ref. Danny Bickson: > > ------------------- > > Thanks for your confirmation on how to use the rank. > > Regarding the scale factor and orthogonalization: Yes, I take it into > > account. I'm running SVD from trunk without any changes. And even after > > commenting out those parts of the code, the results are still wrong in > > the cases 1, 2, 3, 7 and 8 > > > > Thank you for your help. > > > > Markus > > > > > >> On 22 Sep 2011, at 18:37, Markus Holtermann > >> <i...@markusholtermann.eu> wrote: > >> > >>> Hello there, > >>> > >>> I'm trying to run Mahout's Singular Value Decomposition but > >>> realized, that the resulting eigenvalues are wrong in most cases. > >>> So I took two small 3x3 matrices and calculated their eigenvalues > >>> and eigenvectors by hand and compared the results to Mahout. > >>> > >>> Only in one of eight cases the results for Mahout and my pen & > >>> paper matched. > >>> > >>> Lets take A = {{1,2,3},{2,4,5},{3,5,6}} and B = > >>> {{5,2,4},{-3,6,2},{3,-3,1}} > >>> > >>> As you can see, A is symmetric, B is not. > >>> > >>> I ran `mahout svd --output out/ --numRows 3 --numCols 3` eight > >>> times with different arguments: > >>> > >>> 1) --input A --rank 3 --symmetric true result is wrong 2) > >>> --input A --rank 4 --symmetric true result is wrong 3) --input > >>> A --rank 3 --symmetric false result is wrong 4) --input A --rank > >>> 4 --symmetric false result is CORRECT > >>> > >>> 5) --input B --rank 3 --symmetric true result is wrong 6) > >>> --input B --rank 4 --symmetric true result is wrong 7) --input > >>> B --rank 3 --symmetric false result is wrong 8) --input B --rank > >>> 4 --symmetric false result is wrong > >>> > >>> To verify that my input data is correct, this is the result of > >>> `mahout seqdumper` > >>> > >>> For A: Key class: class org.apache.hadoop.io.IntWritable Value > >>> Class: class org.apache.mahout.math.VectorWritable Key: 0: Value: > >>> {0:1.0,1:2.0,2:3.0} Key: 1: Value: {0:2.0,1:4.0,2:5.0} Key: 2: > >>> Value: {0:3.0,1:5.0,2:6.0} Count: 3 > >>> > >>> > >>> For B: Key class: class org.apache.hadoop.io.IntWritable Value > >>> Class: class org.apache.mahout.math.VectorWritable Key: 0: Value: > >>> {0:5.0,1:2.0,2:4.0} Key: 1: Value: {0:-3.0,1:6.0,2:2.0} Key: 2: > >>> Value: {0:3.0,1:-3.0,2:1.0} Count: 3 > >>> > >>> > >>> And finally, the correct eigenvalues should be: For A: λ1 = 11.3448 > >>> λ2 = -0.515729 λ3 = 0.170915 > >>> > >>> For B: λ1 = 7 λ2 = 3 λ3 = 2 > >>> > >>> So, are there any known bugs in Mahout's SVD implementation? Am I > >>> doing something wrong? Is this algorithm known to produce wrong > >>> results? > >>> > >>> Thanks in advance. > >>> > >>> Markus > > > > >