Hey Ted,

As suggested, I tried SSVD with Mahout 0.8. I think the issue of NamedVector 
not propagating to the output U ,still persists.
Here is what I have done :

1. Created featureVector using "seq2sparse" with -nv option. Checked the 
output, named vector created.
2. Provided this featureVector to "ssvd" with params " -k 100 -U true -V true". 
After execution, 3 output got generated namely : sigma, U, V
3. I dumped "U" to check the output if it contained namedVectors:

mahout-distribution-0.7/bin/mahout seqdumper -i /stuti/SSVD/Output/U | more     
Output: 
Key: /File_1: Value: 
{0:0.027019746696983288,1:0.006124424321845726,2:0.0334311500858222,.....} 

I did not see the NamedVector getting created in the output of ssvd. Please 
point out if I have missed any step in between.

As I wanted to perform the Clustering, I took the output of "U" and generated 
the NamedVector with custom code. The output looks like this :
Key: /File_1: Value: 
/File_1:{0:0.027019746696983288,1:0.006124424321845726,2:0.0334311500858222,.....}

Then I fed this namedvector file to KMeans to generate 10 Clusters. In this I 
have used Random Centroid selection with KMeans. 
Finally I dumped the ClusterOutput as :
<ClusterId>,<DocumentID1>,<DocumentId2>.....

Please let me know if I have performed any mistake in the end to end execution 
as well Im not sure Why SSVD output is not generating the named vectors as the 
issue id fixed..

Please suggest

Regards
Stuti Awasthi





-----Original Message-----
From: Ted Dunning [mailto:[email protected]] 
Sent: Thursday, August 01, 2013 8:37 PM
To: [email protected]
Subject: Re: How to SSVD output to generate Clusters

On Thu, Aug 1, 2013 at 5:49 AM, Stuti Awasthi <[email protected]> wrote:

> I think there is a problem because of NamedVector as after some search 
> I get this Jira. https://issues.apache.org/jira/browse/MAHOUT-1067
>

Note also that this bug is fixed in 0.8


::DISCLAIMER::
----------------------------------------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information 
could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in 
transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on 
the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the 
author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, 
dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written 
consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please 
delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and 
other defects.

----------------------------------------------------------------------------------------------------------------------------------------------------

Reply via email to