Here are the steps if u r using Mahout-mrlegacy in the present Mahout trunk:

1. Generate tfidf vectors from the input corpus using seq2sparse (I am
assuming you had done this before and hence avoiding the details)

2. Run SSVD on the generated tfidf vectors from (1)

      ./bin/mahout ssvd -i <tfidf vectors> -o <svd output> -k 80 -pca true
-us true -U false -V false

     k = no. of reduced basis vectors

    You would need the U*Sigma output of the PCA flow for the next
clustering step

3. Run KMeans (or any other clustering algo) with the U*Sigma from (2) as
input.


On Mon, Mar 30, 2015 at 3:39 AM, Donni Khan <prince.don...@googlemail.com>
wrote:

> Hallo Mahout users,
>
> I'm working on text clustering, I would like to reduce the features to
> enhance the clustering process.
> I would like to use  the Singular Value Decomposition before cluatering
> process. I will be thankfull if anyone has used this before, Is it a good
> idea for clustering?
> Is there any other method in mahout to reduce the text features before
> clustring?
> Is anyone has idea how can I apply SVD by using Java code?
>
> Thanks in advance,
> Donni
>

Reply via email to