Markus,

Try testing on a 20x20 matrix if you want to use p>0.  The issue is that
this is an approximation algorithm that works for reasonably high dimension.
 3 is not reasonably high.  20 is probably marginal.

On Fri, Sep 23, 2011 at 4:42 PM, Dmitriy Lyubimov <dlyubi...@apache.org>wrote:

> oh, ok, apparently you need to use p>0.
>
> but then there's a problem that ther's  k+p >=m (input height)
> requirement so I guess this is a corner case i did not account for.
>
> you can use k=2 and p=1 and caveat is that even though 3 singular
> values will be computed, only 2 of them will be saved. this solver
> always assumes "thin" decomposition requirement\s, although
> distinction is purely technical, it is only a matter a patch to enable
> p=0.
>
> It is only a case because your input so small. In practice, input is
> much "longer" than k+p rows so it hasn't come up as an issue. Point
> is, it will not do full rank decomposition with small matrices; but
> then, you don't want to use it with small matrices :)
>
> alhough i can engineer a patch to allow p=0 and full rank
> decompositions for short wide matrices if it is that important.
>
> -dmitriy
>
> On Fri, Sep 23, 2011 at 3:42 PM, Markus Holtermann
> <i...@markusholtermann.eu> wrote:
> > Thank you for all your responses.
> >
> > ref. Dan Brickley:
> > ------------------
> > hopefully you did dream ;-)
> >
> > ref. Dmitriy Lyubimov:
> > ----------------------
> > When I run `mahout ssvd -i A.seq -o A-ssvd/ -k 3 -p 0` I get an
> > IllegalArgumentException. You can find the traceback at
> > http://paste.pocoo.org/show/481168/ .
> >
> > ref. Ted Dunning:
> > -----------------
> > I am running the M/R version of SVD in local mode. I didn't install
> > Hadoop except what is coming via `mvn install`.
> > If I understand the code correctly, the `--inMemory` argument is only
> > relevant for the "EigenVerificationJob" -- I didn't run that.
> >
> > Here are the latest results for the calculations as described in my
> > previous mail:
> >
> > For 1:
> > Key class: class org.apache.hadoop.io.IntWritable
> > Value Class: class org.apache.mahout.math.VectorWritable
> > Key: 0: Value: eigenVector0, eigenvalue = 11.344411508600611:
> > {0:0.8940505788976013,1:0.05761556873901637,2:-0.44424543735613486}
> > Key: 1: Value: eigenVector1, eigenvalue = 0.0:
> > {0:-0.3030457633656634,1:0.8081220356417685,2:-0.5050762722761053}
> > Key: 2: Value: eigenVector2, eigenvalue = -0.4362482432944815:
> > {0:0.3299042704770375,1:0.5861904313011974,2:0.7399621277956934}
> > Count: 3
> >
> > For 2:
> > Key class: class org.apache.hadoop.io.IntWritable
> > Value Class: class org.apache.mahout.math.VectorWritable
> > Key: 0: Value: eigenVector0, eigenvalue = 11.344814282762082:
> > {0:0.7369762290995766,1:0.3279852776056837,2:-0.5910090485061045}
> > Key: 1: Value: eigenVector1, eigenvalue = 0.17091518882717976:
> > {0:0.9225878132457447,1:0.3812202473600341,2:0.05918487858557608}
> > Key: 2: Value: eigenVector2, eigenvalue = 0.0:
> > {0:-0.5910090485061055,1:0.7369762290995774,2:-0.3279852776056802}
> > Key: 3: Value: eigenVector3, eigenvalue =
> >
> -0.5157294715892533:{0:-0.32798527760568197,1:-0.5910090485061036,2:-0.7369762290995783}
> > Count: 4
> >
> > For 3:
> > Key class: class org.apache.hadoop.io.IntWritable
> > Value Class: class org.apache.mahout.math.VectorWritable
> > Key: 0: Value: eigenVector0, eigenvalue = 11.344814080004587:
> > {0:0.2870124314018251,1:-0.8054865010309287,2:0.5184740696291035}
> > Key: 1: Value: eigenVector1, eigenvalue = 0.4852290375835231:
> > {0:0.9000472484774761,1:0.041469409433508436,2:-0.4338147514658307}
> > Key: 2: Value: eigenVector2, eigenvalue = 0.0:
> > {0:0.3279311127797073,1:0.5911613863727806,2:0.7368781449689461}
> > Count: 3
> >
> > For 4:
> > Key class: class org.apache.hadoop.io.IntWritable
> > Value Class: class org.apache.mahout.math.VectorWritable
> > Key: 0: Value: eigenVector0, eigenvalue = 11.34481428276208:
> > {0:0.788451139115581,1:0.5058848349238699,2:0.3498933194866569}
> > Key: 1: Value: eigenVector1, eigenvalue = 0.5157294715892401:
> > {0:-0.5910090485061453,1:0.7369762290995597,2:-0.32798527760564816}
> > Key: 2: Value: eigenVector2, eigenvalue = 0.1709151888272022:
> > {0:-0.7369762290995447,1:-0.3279852776057236,2:0.5910090485061223}
> > Key: 3: Value: eigenVector3, eigenvalue = 0.0:
> > {0:-0.3279852776056819,1:-0.5910090485061036,2:-0.7369762290995783}
> > Count: 4
> >
> > For 5:
> > Key class: class org.apache.hadoop.io.IntWritable
> > Value Class: class org.apache.mahout.math.VectorWritable
> > Key: 0: Value: eigenVector0, eigenvalue = 7.7949818262315:
> > {0:-0.3998289016610171,1:0.3486764982772797,2:0.8476800982361441}
> > Key: 1: Value: eigenVector1, eigenvalue = 0.0:
> > {0:0.3244428422615253,1:-0.8111071056538125,2:0.4866642633922878}
> > Key: 2: Value: eigenVector2, eigenvalue = -2.2686660367578133:
> > {0:0.8572477421969729,1:0.4696061783100697,2:0.21117846905213422}
> > Count: 3
> >
> > For 6:
> > Key class: class org.apache.hadoop.io.IntWritable
> > Value Class: class org.apache.mahout.math.VectorWritable
> > Key: 0: Value: eigenVector0, eigenvalue = 9.903422603237882:
> > {0:-0.305869782876591,1:-0.012493432384138303,2:0.9519913813004245}
> > Key: 1: Value: eigenVector1, eigenvalue = 6.002722238353203:
> > {0:-0.7781330995244824,1:0.06366543541563939,2:0.624864458709054}
> > Key: 2: Value: eigenVector2, eigenvalue = 0.0:
> > {0:0.2988138112963618,1:0.9481291552697455,2:0.10845003967736172}
> > Key: 3: Value: eigenVector3, eigenvalue = -3.906144841591079:
> > {0:0.9039656974142156,1:-0.3176397630567398,2:0.2862708487144453}
> > Count: 4
> >
> > For 7:
> > Key class: class org.apache.hadoop.io.IntWritable
> > Value Class: class org.apache.mahout.math.VectorWritable
> > Key: 0: Value: eigenVector0, eigenvalue = 7.04924152040162:
> > {0:-0.4082482904638631,1:0.8164965809277261,2:-0.4082482904638631}
> > Key: 1: Value: eigenVector1, eigenvalue = 3.782617346103868:
> > {0:0.7808892910047764,1:0.08072916428282848,2:-0.6194309624391194}
> > Key: 2: Value: eigenVector2, eigenvalue = 0.0:
> > {0:0.47280571964327067,1:0.5716783495703939,2:0.6705509794975171}
> > Count: 3
> >
> > For 8:
> > Key class: class org.apache.hadoop.io.IntWritable
> > Value Class: class org.apache.mahout.math.VectorWritable
> > Key: 0: Value: eigenVector0, eigenvalue = 7.964450219004663:
> > {0:NaN,1:NaN,2:NaN}
> > Key: 1: Value: eigenVector1, eigenvalue = 7.000000000000002:
> > {0:NaN,1:NaN,2:NaN}
> > Key: 2: Value: eigenVector2, eigenvalue = 0.753347668076679:
> > {0:NaN,1:NaN,2:NaN}
> > Key: 3: Value: eigenVector3, eigenvalue = 0.0:
> > {0:NaN,1:NaN,2:NaN}
> > Count: 4
> >
> >
> > ref. Danny Bickson:
> > -------------------
> > Thanks for your confirmation on how to use the rank.
> > Regarding the scale factor and orthogonalization: Yes, I take it into
> > account. I'm running SVD from trunk without any changes. And even after
> > commenting out those parts of the code, the results are still wrong in
> > the cases 1, 2, 3, 7 and 8
> >
> > Thank you for your help.
> >
> > Markus
> >
> >
> >> On 22 Sep 2011, at 18:37, Markus Holtermann
> >> <i...@markusholtermann.eu> wrote:
> >>
> >>> Hello there,
> >>>
> >>> I'm trying to run Mahout's Singular Value Decomposition but
> >>> realized, that the resulting eigenvalues are wrong in most cases.
> >>> So I took two small 3x3 matrices and calculated their eigenvalues
> >>> and eigenvectors by hand and compared the results to Mahout.
> >>>
> >>> Only in one of eight cases the results for Mahout and my pen &
> >>> paper matched.
> >>>
> >>> Lets take A = {{1,2,3},{2,4,5},{3,5,6}} and B =
> >>> {{5,2,4},{-3,6,2},{3,-3,1}}
> >>>
> >>> As you can see, A is symmetric, B is not.
> >>>
> >>> I ran `mahout svd --output out/ --numRows 3 --numCols 3` eight
> >>> times with different arguments:
> >>>
> >>> 1) --input A --rank 3 --symmetric true    result is wrong 2)
> >>> --input A --rank 4 --symmetric true    result is wrong 3) --input
> >>> A --rank 3 --symmetric false   result is wrong 4) --input A --rank
> >>> 4 --symmetric false   result is CORRECT
> >>>
> >>> 5) --input B --rank 3 --symmetric true    result is wrong 6)
> >>> --input B --rank 4 --symmetric true    result is wrong 7) --input
> >>> B --rank 3 --symmetric false   result is wrong 8) --input B --rank
> >>> 4 --symmetric false   result is wrong
> >>>
> >>> To verify that my input data is correct, this is the result of
> >>> `mahout seqdumper`
> >>>
> >>> For A: Key class: class org.apache.hadoop.io.IntWritable Value
> >>> Class: class org.apache.mahout.math.VectorWritable Key: 0: Value:
> >>> {0:1.0,1:2.0,2:3.0} Key: 1: Value: {0:2.0,1:4.0,2:5.0} Key: 2:
> >>> Value: {0:3.0,1:5.0,2:6.0} Count: 3
> >>>
> >>>
> >>> For B: Key class: class org.apache.hadoop.io.IntWritable Value
> >>> Class: class org.apache.mahout.math.VectorWritable Key: 0: Value:
> >>> {0:5.0,1:2.0,2:4.0} Key: 1: Value: {0:-3.0,1:6.0,2:2.0} Key: 2:
> >>> Value: {0:3.0,1:-3.0,2:1.0} Count: 3
> >>>
> >>>
> >>> And finally, the correct eigenvalues should be: For A: λ1 = 11.3448
> >>> λ2 = -0.515729 λ3 = 0.170915
> >>>
> >>> For B: λ1 = 7 λ2 = 3 λ3 = 2
> >>>
> >>> So, are there any known bugs in Mahout's SVD implementation? Am I
> >>> doing something wrong? Is this algorithm known to produce wrong
> >>> results?
> >>>
> >>> Thanks in advance.
> >>>
> >>> Markus
> >
> >
>

Reply via email to