So - it's always a little embarrassing saying "I don't understand",
but here goes. I can't claim to have strong linear math skills, and
don't mind admitting that, but I've (hopefully) a rough idea what's
going on. I have got now to the basic stage where I've at least put a
matrix into the hadoop SVD/Lanczos implementation (ie.
https://issues.apache.org/jira/browse/MAHOUT-180)  and got something
out again. But then I hit a wall...

My problem is that I was imagining the results would be three factor'd
matrixes (which when multiplied would reproduce the original, and from
which I could take left-most columns per various SVD tutorials).
Instead, I get:

11/02/25 10:03:11 INFO decomposer.DistributedLanczosSolver: Persisting
10 eigenVectors and eigenValues to: outpath/rawEigenvectors

which when unpacked with
http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization-reading.html
gives me

key 0 value: 
{0:-0.5695508206727358,1:-0.4285601649419706,2:-0.3882489326234163,3:-0.584132531205635}
key 1 value: {0:-0.2721655269759087,2:0.13608276348795434,3:-0.9525793444156804}
key 2 value: 
{0:0.022062855712982308,1:0.7148365251006306,2:0.6927736693876481,3:0.09266399399452617}
key 3 value: 
{0:0.783849515338196,1:0.3919247576690979,2:-0.3919247576690983,3:-0.2799462554779274}
key 4 value: 
{0:0.557690858476082,1:-0.5791405068790082,2:0.5898653310804713,3:-0.0750737694102418}
key 5 value: 
{0:-0.22447685082502516,1:-0.5158243682948284,2:0.8253228908193994,3:0.04875951606025693}
key 6 value: 
{0:0.13483997249264842,1:-0.13483997249264842,2:-0.9438798074485389,3:0.26967994498529685}
key 7 value: 
{0:-0.6758100682735698,1:0.693089266667016,2:0.2503084269657081,3:-0.01592832197702688}
key 8 value: 
{0:0.4104908741187378,1:0.26436466202790915,2:0.32473155620641553,3:-0.8100357918881508}

....i.e. a single grid of values. Now
http://en.wikipedia.org/wiki/Singular_value_decomposition#Relation_to_eigenvalue_decomposition
and http://www.scribd.com/doc/7017586/Gorrell-Webb tell me that these
are intimately related to the SVD 3 matrices, however for a novice the
connection isn't entirely clear.

I'll copy details of the specific job / data I tried below, but the
basic issue is I guess more of documentation for tool-oriented rather
than math-oriented users. So consider this a case study in
misunderstanding. If the answer is "you need to (re)learn a bit more
maths", that's a fine outcome. If I get my head around this I'll try
to reflect what I learn back into the Wiki.

So I was inspired to dig into SVD by running across a few friendly
tutorials like 
http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/
and I tried to stick with their example for my original test, also to
walk through it in Matlab/Octave.

So my matrix was (in matlab-ese), from their 'Family Guy Seasons x
Users', where the elements were specific ratings by users for seasons:

A = [5,5,0,5; 5,0,3,4; 3,4,0,3; 0,0,5,3; 5,4,4,5; 5,4,5,5]

I converted it to Mahout binary using the tool at
http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html
with the following input csv data:

0,0,5.0
0,1,5.0
0,3,5.0
1,0,5.0
1,2,3.0
1,3,4.0
2,0,3.0
2,1,4.0
2,3,3.0
3,2,5.0
3,3,3.0
4,0,5.0
4,1,4.0
4,2,4.0
4,3,5.0
5,0,5.0
5,1,4.0
5,2,5.0
5,3,5.0

(note I just skip the zero'd elements; is that appropriate/correct?)

On the hadoop cluster I blundered my way into the following:

hadoop jar ./mahout-examples-0.5-SNAPSHOT-job.jar \
  org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver \
  --input svdoutput.mht  --output outpath --numRows 6 --numCols 4 --rank 10

...which is where I got the values given at start of this mail. I've
poked around in octave with
http://www.mathworks.com/help/techdoc/ref/eig.html and
http://www.mathworks.com/help/techdoc/ref/svd.html but I've really hit
my limit here I think.

Thanks for any pointers or other advice,

cheers,

Dan

ps. re wiki documentation, how do you all feel about continuing to use
the example in 
http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/
? maybe would be good to have matlab equivalents in there too?

Reply via email to