Re: How to use ssvd for dimensionality reduction of tfidf-vectors?

Dmitriy Lyubimov Fri, 19 Oct 2012 09:48:39 -0700

Ssvd process for dimentionality reduction is easier. Assuming your data
points are row vectors lf the input (which is the case with outpht of
Mahout's seq2sparse) you need the U*Sigma output of the pca flow.


I.e. You need something like
mahout ssvd -i input -o output -k 80 -pca true -us true -U false -V false...

This information is also in the latest ssvd manual on wiki.

Take latest trunk. Some of pca flow components got broken recently and i
fixed them just last week.
On Oct 19, 2012 9:06 AM, "Matt Molek" <[email protected]> wrote:

> Sorry for the basic question. I've been reading about this for a few hours,
> but I'm still confused. I want to use ssvd to reduce the dimensionality of
> some tfidf-vectors so I can perform clustering on the result.
>
> Among many other things, I've read:
> https://cwiki.apache.org/MAHOUT/dimensional-reduction.html
>
> Which states the process for svd is:
>
> bin/mahout svd (original -> svdOut)
> bin/mahout cleansvd ...
> bin/mahout transpose svdOut -> svdT
> bin/mahout transpose original -> originalT
> bin/mahout matrixmult originalT svdT -> newMatrix
> bin/mahout kmeans newMatrix
>
> I know you don't need to do cleansvd with ssvd output. My main question is
> which of the three outputs of ssvd should I be transposing and multiplying
> with the original tfidf-matrix? I'm having trouble understanding the math
> that's going on.
>
> ssvd outputs U, V, and sigma, and despite reading a bunch, I'm still
> confused on which of these outputs I should be using, and how. Could anyone
> spell it out for me?
>
> Thanks for any help,
> Matt
>

Re: How to use ssvd for dimensionality reduction of tfidf-vectors?

Reply via email to