-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey guys.

Thank you for all the information about Singular Value Decomposition.
The Lanczos algorithm seems to be a bad choice for small matrices. But
the Stochastic SVD with k = full rank and p = 0 (thanks Dmitriy
Lyubimov for implementing that) works fine.

So far, Markus

On 09/23/2011 08:46 PM, Dmitriy Lyubimov wrote:
> I already fixed full rank (p =0) on the trunk. It was just an
> invalid assertion, the algorithm isn't limiting that. So k=3 p=0
> should be ok now in the trunk. On Sep 23, 2011 8:34 PM, "Ted
> Dunning" <ted.dunn...@gmail.com> wrote:
>> Markus,
>> 
>> Try testing on a 20x20 matrix if you want to use p>0. The issue
>> is that this is an approximation algorithm that works for
>> reasonably high
> dimension.
>> 3 is not reasonably high. 20 is probably marginal.
>> 
>> On Fri, Sep 23, 2011 at 4:42 PM, Dmitriy Lyubimov
>> <dlyubi...@apache.org wrote:
>> 
>>> oh, ok, apparently you need to use p>0.
>>> 
>>> but then there's a problem that ther's k+p >=m (input height) 
>>> requirement so I guess this is a corner case i did not account
>>> for.
>>> 
>>> you can use k=2 and p=1 and caveat is that even though 3
>>> singular values will be computed, only 2 of them will be saved.
>>> this solver always assumes "thin" decomposition requirement\s,
>>> although distinction is purely technical, it is only a matter a
>>> patch to enable p=0.
>>> 
>>> It is only a case because your input so small. In practice,
>>> input is much "longer" than k+p rows so it hasn't come up as an
>>> issue. Point is, it will not do full rank decomposition with
>>> small matrices; but then, you don't want to use it with small
>>> matrices :)
>>> 
>>> alhough i can engineer a patch to allow p=0 and full rank 
>>> decompositions for short wide matrices if it is that
>>> important.
>>> 
>>> -dmitriy
>>> 
>>> On Fri, Sep 23, 2011 at 3:42 PM, Markus Holtermann 
>>> <i...@markusholtermann.eu> wrote:
>>>> Thank you for all your responses.
>>>> 
>>>> ref. Dan Brickley: ------------------ hopefully you did dream
>>>> ;-)
>>>> 
>>>> ref. Dmitriy Lyubimov: ---------------------- When I run
>>>> `mahout ssvd -i A.seq -o A-ssvd/ -k 3 -p 0` I get an 
>>>> IllegalArgumentException. You can find the traceback at 
>>>> http://paste.pocoo.org/show/481168/ .
>>>> 
>>>> ref. Ted Dunning: ----------------- I am running the M/R
>>>> version of SVD in local mode. I didn't install Hadoop except
>>>> what is coming via `mvn install`. If I understand the code
>>>> correctly, the `--inMemory` argument is only relevant for the
>>>> "EigenVerificationJob" -- I didn't run that.
>>>> 
>>>> Here are the latest results for the calculations as described
>>>> in my previous mail:
>>>> 
>>>> For 1: Key class: class org.apache.hadoop.io.IntWritable 
>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>> 0: Value: eigenVector0, eigenvalue = 11.344411508600611: 
>>>> {0:0.8940505788976013,1:0.05761556873901637,2:-0.44424543735613486}
>>>>
>>>> 
Key: 1: Value: eigenVector1, eigenvalue = 0.0:
>>>> {0:-0.3030457633656634,1:0.8081220356417685,2:-0.5050762722761053}
>>>>
>>>> 
Key: 2: Value: eigenVector2, eigenvalue = -0.4362482432944815:
>>>> {0:0.3299042704770375,1:0.5861904313011974,2:0.7399621277956934}
>>>>
>>>> 
Count: 3
>>>> 
>>>> For 2: Key class: class org.apache.hadoop.io.IntWritable 
>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>> 0: Value: eigenVector0, eigenvalue = 11.344814282762082: 
>>>> {0:0.7369762290995766,1:0.3279852776056837,2:-0.5910090485061045}
>>>>
>>>> 
Key: 1: Value: eigenVector1, eigenvalue = 0.17091518882717976:
>>>> {0:0.9225878132457447,1:0.3812202473600341,2:0.05918487858557608}
>>>>
>>>> 
Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>>>> {0:-0.5910090485061055,1:0.7369762290995774,2:-0.3279852776056802}
>>>>
>>>> 
Key: 3: Value: eigenVector3, eigenvalue =
>>>> 
>>> 
> -0.5157294715892533:{0:-0.32798527760568197,1:-0.5910090485061036,2:-0.7369762290995783}
>>>>
> 
Count: 4
>>>> 
>>>> For 3: Key class: class org.apache.hadoop.io.IntWritable 
>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>> 0: Value: eigenVector0, eigenvalue = 11.344814080004587: 
>>>> {0:0.2870124314018251,1:-0.8054865010309287,2:0.5184740696291035}
>>>>
>>>> 
Key: 1: Value: eigenVector1, eigenvalue = 0.4852290375835231:
>>>> {0:0.9000472484774761,1:0.041469409433508436,2:-0.4338147514658307}
>>>>
>>>> 
Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>>>> {0:0.3279311127797073,1:0.5911613863727806,2:0.7368781449689461}
>>>>
>>>> 
Count: 3
>>>> 
>>>> For 4: Key class: class org.apache.hadoop.io.IntWritable 
>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>> 0: Value: eigenVector0, eigenvalue = 11.34481428276208: 
>>>> {0:0.788451139115581,1:0.5058848349238699,2:0.3498933194866569}
>>>>
>>>> 
Key: 1: Value: eigenVector1, eigenvalue = 0.5157294715892401:
>>>> {0:-0.5910090485061453,1:0.7369762290995597,2:-0.32798527760564816}
>>>>
>>>> 
Key: 2: Value: eigenVector2, eigenvalue = 0.1709151888272022:
>>>> {0:-0.7369762290995447,1:-0.3279852776057236,2:0.5910090485061223}
>>>>
>>>> 
Key: 3: Value: eigenVector3, eigenvalue = 0.0:
>>>> {0:-0.3279852776056819,1:-0.5910090485061036,2:-0.7369762290995783}
>>>>
>>>> 
Count: 4
>>>> 
>>>> For 5: Key class: class org.apache.hadoop.io.IntWritable 
>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>> 0: Value: eigenVector0, eigenvalue = 7.7949818262315: 
>>>> {0:-0.3998289016610171,1:0.3486764982772797,2:0.8476800982361441}
>>>>
>>>> 
Key: 1: Value: eigenVector1, eigenvalue = 0.0:
>>>> {0:0.3244428422615253,1:-0.8111071056538125,2:0.4866642633922878}
>>>>
>>>> 
Key: 2: Value: eigenVector2, eigenvalue = -2.2686660367578133:
>>>> {0:0.8572477421969729,1:0.4696061783100697,2:0.21117846905213422}
>>>>
>>>> 
Count: 3
>>>> 
>>>> For 6: Key class: class org.apache.hadoop.io.IntWritable 
>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>> 0: Value: eigenVector0, eigenvalue = 9.903422603237882: 
>>>> {0:-0.305869782876591,1:-0.012493432384138303,2:0.9519913813004245}
>>>>
>>>> 
Key: 1: Value: eigenVector1, eigenvalue = 6.002722238353203:
>>>> {0:-0.7781330995244824,1:0.06366543541563939,2:0.624864458709054}
>>>>
>>>> 
Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>>>> {0:0.2988138112963618,1:0.9481291552697455,2:0.10845003967736172}
>>>>
>>>> 
Key: 3: Value: eigenVector3, eigenvalue = -3.906144841591079:
>>>> {0:0.9039656974142156,1:-0.3176397630567398,2:0.2862708487144453}
>>>>
>>>> 
Count: 4
>>>> 
>>>> For 7: Key class: class org.apache.hadoop.io.IntWritable 
>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>> 0: Value: eigenVector0, eigenvalue = 7.04924152040162: 
>>>> {0:-0.4082482904638631,1:0.8164965809277261,2:-0.4082482904638631}
>>>>
>>>> 
Key: 1: Value: eigenVector1, eigenvalue = 3.782617346103868:
>>>> {0:0.7808892910047764,1:0.08072916428282848,2:-0.6194309624391194}
>>>>
>>>> 
Key: 2: Value: eigenVector2, eigenvalue = 0.0:
>>>> {0:0.47280571964327067,1:0.5716783495703939,2:0.6705509794975171}
>>>>
>>>> 
Count: 3
>>>> 
>>>> For 8: Key class: class org.apache.hadoop.io.IntWritable 
>>>> Value Class: class org.apache.mahout.math.VectorWritable Key:
>>>> 0: Value: eigenVector0, eigenvalue = 7.964450219004663: 
>>>> {0:NaN,1:NaN,2:NaN} Key: 1: Value: eigenVector1, eigenvalue =
>>>> 7.000000000000002: {0:NaN,1:NaN,2:NaN} Key: 2: Value:
>>>> eigenVector2, eigenvalue = 0.753347668076679: 
>>>> {0:NaN,1:NaN,2:NaN} Key: 3: Value: eigenVector3, eigenvalue =
>>>> 0.0: {0:NaN,1:NaN,2:NaN} Count: 4
>>>> 
>>>> 
>>>> ref. Danny Bickson: ------------------- Thanks for your
>>>> confirmation on how to use the rank. Regarding the scale
>>>> factor and orthogonalization: Yes, I take it into account.
>>>> I'm running SVD from trunk without any changes. And even
>>>> after commenting out those parts of the code, the results are
>>>> still wrong in the cases 1, 2, 3, 7 and 8
>>>> 
>>>> Thank you for your help.
>>>> 
>>>> Markus
>>>> 
>>>> 
>>>>> On 22 Sep 2011, at 18:37, Markus Holtermann 
>>>>> <i...@markusholtermann.eu> wrote:
>>>>> 
>>>>>> Hello there,
>>>>>> 
>>>>>> I'm trying to run Mahout's Singular Value Decomposition
>>>>>> but realized, that the resulting eigenvalues are wrong in
>>>>>> most cases. So I took two small 3x3 matrices and
>>>>>> calculated their eigenvalues and eigenvectors by hand and
>>>>>> compared the results to Mahout.
>>>>>> 
>>>>>> Only in one of eight cases the results for Mahout and my
>>>>>> pen & paper matched.
>>>>>> 
>>>>>> Lets take A = {{1,2,3},{2,4,5},{3,5,6}} and B = 
>>>>>> {{5,2,4},{-3,6,2},{3,-3,1}}
>>>>>> 
>>>>>> As you can see, A is symmetric, B is not.
>>>>>> 
>>>>>> I ran `mahout svd --output out/ --numRows 3 --numCols 3`
>>>>>> eight times with different arguments:
>>>>>> 
>>>>>> 1) --input A --rank 3 --symmetric true result is wrong
>>>>>> 2) --input A --rank 4 --symmetric true result is wrong 3)
>>>>>> --input A --rank 3 --symmetric false result is wrong 4)
>>>>>> --input A --rank 4 --symmetric false result is CORRECT
>>>>>> 
>>>>>> 5) --input B --rank 3 --symmetric true result is wrong
>>>>>> 6) --input B --rank 4 --symmetric true result is wrong 7)
>>>>>> --input B --rank 3 --symmetric false result is wrong 8)
>>>>>> --input B --rank 4 --symmetric false result is wrong
>>>>>> 
>>>>>> To verify that my input data is correct, this is the
>>>>>> result of `mahout seqdumper`
>>>>>> 
>>>>>> For A: Key class: class org.apache.hadoop.io.IntWritable
>>>>>> Value Class: class org.apache.mahout.math.VectorWritable
>>>>>> Key: 0: Value: {0:1.0,1:2.0,2:3.0} Key: 1: Value:
>>>>>> {0:2.0,1:4.0,2:5.0} Key: 2: Value: {0:3.0,1:5.0,2:6.0}
>>>>>> Count: 3
>>>>>> 
>>>>>> 
>>>>>> For B: Key class: class org.apache.hadoop.io.IntWritable
>>>>>> Value Class: class org.apache.mahout.math.VectorWritable
>>>>>> Key: 0: Value: {0:5.0,1:2.0,2:4.0} Key: 1: Value:
>>>>>> {0:-3.0,1:6.0,2:2.0} Key: 2: Value: {0:3.0,1:-3.0,2:1.0}
>>>>>> Count: 3
>>>>>> 
>>>>>> 
>>>>>> And finally, the correct eigenvalues should be: For A: λ1
>>>>>> = 11.3448 λ2 = -0.515729 λ3 = 0.170915
>>>>>> 
>>>>>> For B: λ1 = 7 λ2 = 3 λ3 = 2
>>>>>> 
>>>>>> So, are there any known bugs in Mahout's SVD
>>>>>> implementation? Am I doing something wrong? Is this
>>>>>> algorithm known to produce wrong results?
>>>>>> 
>>>>>> Thanks in advance.
>>>>>> 
>>>>>> Markus
>>>> 
>>>> 
>>> 
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6DhEIACgkQA8JzLzUe2LNSHwCgpc/ZgUXPaq0aNwrbcPGH4AXB
MVgAnjrgbceGHNHcHheCPPGydoAvcr57
=DBHE
-----END PGP SIGNATURE-----

Reply via email to