Hi Matt, I ran into the same issue a few months ago. Here's the thread from the mailing list archives [1]. Also, check out this pdf [2] - it's more explicit regarding the functionality of the various command line params for ssvd.
Cheers, Chris [1] http://mail-archives.apache.org/mod_mbox/mahout-user/201206.mbox/%3ccabcmrknw7wwpnwdambgnkslb17ijb+sbcep9smdg2yevu9c...@mail.gmail.com%3E [2] https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCgQFjAB&url=https%3A%2F%2Fcwiki.apache.org%2FMAHOUT%2Fstochastic-singular-value-decomposition.data%2FSSVD-CLI.pdf&ei=bHuBULqUJtCChQfO2ICADw&usg=AFQjCNGXl96KMqCNt9V6KmSNwLkPyjk7sA&cad=rja On Fri, Oct 19, 2012 at 11:06 AM, Matt Molek <[email protected]> wrote: > Sorry for the basic question. I've been reading about this for a few hours, > but I'm still confused. I want to use ssvd to reduce the dimensionality of > some tfidf-vectors so I can perform clustering on the result. > > Among many other things, I've read: > https://cwiki.apache.org/MAHOUT/dimensional-reduction.html > > Which states the process for svd is: > > bin/mahout svd (original -> svdOut) > bin/mahout cleansvd ... > bin/mahout transpose svdOut -> svdT > bin/mahout transpose original -> originalT > bin/mahout matrixmult originalT svdT -> newMatrix > bin/mahout kmeans newMatrix > > I know you don't need to do cleansvd with ssvd output. My main question is > which of the three outputs of ssvd should I be transposing and multiplying > with the original tfidf-matrix? I'm having trouble understanding the math > that's going on. > > ssvd outputs U, V, and sigma, and despite reading a bunch, I'm still > confused on which of these outputs I should be using, and how. Could anyone > spell it out for me? > > Thanks for any help, > Matt >
