Hi Matt,

I ran into the same issue a few months ago. Here's the thread from the
mailing list archives [1]. Also, check out this pdf [2] - it's more
explicit regarding the functionality of the various command line params for
ssvd.

Cheers,
Chris

[1]
http://mail-archives.apache.org/mod_mbox/mahout-user/201206.mbox/%3ccabcmrknw7wwpnwdambgnkslb17ijb+sbcep9smdg2yevu9c...@mail.gmail.com%3E
[2]
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCgQFjAB&url=https%3A%2F%2Fcwiki.apache.org%2FMAHOUT%2Fstochastic-singular-value-decomposition.data%2FSSVD-CLI.pdf&ei=bHuBULqUJtCChQfO2ICADw&usg=AFQjCNGXl96KMqCNt9V6KmSNwLkPyjk7sA&cad=rja

On Fri, Oct 19, 2012 at 11:06 AM, Matt Molek <[email protected]> wrote:

> Sorry for the basic question. I've been reading about this for a few hours,
> but I'm still confused. I want to use ssvd to reduce the dimensionality of
> some tfidf-vectors so I can perform clustering on the result.
>
> Among many other things, I've read:
> https://cwiki.apache.org/MAHOUT/dimensional-reduction.html
>
> Which states the process for svd is:
>
> bin/mahout svd (original -> svdOut)
> bin/mahout cleansvd ...
> bin/mahout transpose svdOut -> svdT
> bin/mahout transpose original -> originalT
> bin/mahout matrixmult originalT svdT -> newMatrix
> bin/mahout kmeans newMatrix
>
> I know you don't need to do cleansvd with ssvd output. My main question is
> which of the three outputs of ssvd should I be transposing and multiplying
> with the original tfidf-matrix? I'm having trouble understanding the math
> that's going on.
>
> ssvd outputs U, V, and sigma, and despite reading a bunch, I'm still
> confused on which of these outputs I should be using, and how. Could anyone
> spell it out for me?
>
> Thanks for any help,
> Matt
>

Reply via email to