Hey Dmitriy,
Sorry for replying late.
I have re-run the steps to generate U. Here are the output:
1. "seq2sparse" Output: Contains Named Vector
Key: /Description_10: Value:
/Description_10:{554:1.0,54:1.0,514:1.0,322:1.0,247:1.0,91:1.0,127:1.0,456:1.0,480:1.0,713:1.0,674:1.0,117:1.0,461:1.0,595:2.0,446:1.0,296:2.0}
2. "ssvd" Output of U: Not Contained NamedVector
Key: /Description_10: Value:
{0:0.010564205396743468,1:-0.01989403962804719,2:0.015640314765729225,3:0.04031717183780774,4:-0.03325995251075869,5:0.0294201152018514,6:
0.03834130611889856,7:-0.008686421005328312,8:-0.06164515883823538,9:0.03752875772953153,10:0.04739786931946798,11:-0.07912744917669134,12:0.020078421275704143,13:-0.0
4017504785907734,14:0.012539132502559502,15:0.0733073647645918,16:-0.02111033727307056,17:0.0799478317610193,18:-0.08481960414593219,19:-0.06987848875856222,20:0.03693
2920091059446,21:-0.06949180571421532,22:-0.03447267994522256,23:-0.07104196347181493,24:0.022262180555421562,25:-0.0485632586340187,26:-0.05380823388650383,27:0.09299
533887785207,28:0.0019344239524856396,29:0.002936116541403362,30:-0.07249587007236825,31:0.0016026176038041033,32:-0.0711115256224166,33:0.06603931206284432,34:0.01922
6806201249697,35:0.13972781245330326,36:0.0787696939450401,37:0.07065356340476747,38:0.08437107545490818,39:0.06381670380272558,40:0.046405964753673735,41:0.0601332388
594578,42:-0.12996454299711707,43:0.10779361589915878,44:-0.06524702754474347,45:0.014785171162887613,46:-0.036630574690084586,47:-0.15066656149902793,48:0.16190482591
405958,49:-0.00869851116149916}
As you said that propagation of keys to keys or/and names to names should
happen . Any idea what's going wrong or if there is a mistake from my side ?
Thanks
Stuti Awasthi
-----Original Message-----
From: Dmitriy Lyubimov [mailto:[email protected]]
Sent: Friday, August 02, 2013 11:39 PM
To: [email protected]
Subject: Re: How to SSVD output to generate Clusters
by eyeballing the code, i think i don' t see a problem. if rows of A are named
values, then row of U (or U*Sigma or U*Sigma^1/2) would also retain names from
values of rows of A. Output would not contain NamedVector values for rows that
were not NamedVector values themselves in the input. Does seq2sparse output
create NamedVectors as values ?
Note that if what you want is to have *keys* from from seq2sparse (such as
'file_1') propagated to name of named vector value in U, that much is not
happening. The algorithm propagates keys to keys or/and names to names (but not
any other combinations of those).
-d
On Fri, Aug 2, 2013 at 10:42 AM, Dmitriy Lyubimov <[email protected]> wrote:
> it should. i worked on the issue and last time it was checked it was
> still working with name propagation. if not, then it is a bug
>
>
> On Fri, Aug 2, 2013 at 3:33 AM, Stuti Awasthi <[email protected]>wrote:
>
>> Hey Ted,
>>
>> As suggested, I tried SSVD with Mahout 0.8. I think the issue of
>> NamedVector not propagating to the output U ,still persists.
>> Here is what I have done :
>>
>> 1. Created featureVector using "seq2sparse" with -nv option. Checked
>> the output, named vector created.
>> 2. Provided this featureVector to "ssvd" with params " -k 100 -U true
>> -V true". After execution, 3 output got generated namely : sigma, U,
>> V 3. I dumped "U" to check the output if it contained namedVectors:
>>
>> mahout-distribution-0.7/bin/mahout seqdumper -i /stuti/SSVD/Output/U
>> | more
>> Output:
>> Key: /File_1: Value:
>> {0:0.027019746696983288,1:0.006124424321845726,2:0.0334311500858222,.
>> ....}
>>
>> I did not see the NamedVector getting created in the output of ssvd.
>> Please point out if I have missed any step in between.
>>
>> As I wanted to perform the Clustering, I took the output of "U" and
>> generated the NamedVector with custom code. The output looks like this :
>> Key: /File_1: Value:
>> /File_1:{0:0.027019746696983288,1:0.006124424321845726,2:0.0334311500
>> 858222,.....}
>>
>> Then I fed this namedvector file to KMeans to generate 10 Clusters.
>> In this I have used Random Centroid selection with KMeans.
>> Finally I dumped the ClusterOutput as :
>> <ClusterId>,<DocumentID1>,<DocumentId2>.....
>>
>> Please let me know if I have performed any mistake in the end to end
>> execution as well Im not sure Why SSVD output is not generating the
>> named vectors as the issue id fixed..
>>
>> Please suggest
>>
>> Regards
>> Stuti Awasthi
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Ted Dunning [mailto:[email protected]]
>> Sent: Thursday, August 01, 2013 8:37 PM
>> To: [email protected]
>> Subject: Re: How to SSVD output to generate Clusters
>>
>> On Thu, Aug 1, 2013 at 5:49 AM, Stuti Awasthi <[email protected]>
>> wrote:
>>
>> > I think there is a problem because of NamedVector as after some
>> > search I get this Jira.
>> > https://issues.apache.org/jira/browse/MAHOUT-1067
>> >
>>
>> Note also that this bug is fixed in 0.8
>>
>>
>> ::DISCLAIMER::
>>
>> ---------------------------------------------------------------------
>> ---------------------------------------------------------------------
>> ----------
>>
>> The contents of this e-mail and any attachment(s) are confidential
>> and intended for the named recipient(s) only.
>> E-mail transmission is not guaranteed to be secure or error-free as
>> information could be intercepted, corrupted, lost, destroyed, arrive
>> late or incomplete, or may contain viruses in transmission. The e
>> mail and its contents (with or without referred errors) shall
>> therefore not attach any liability on the originator or HCL or its
>> affiliates.
>> Views or opinions, if any, presented in this email are solely those
>> of the author and may not necessarily reflect the views or opinions
>> of HCL or its affiliates. Any form of reproduction, dissemination,
>> copying, disclosure, modification, distribution and / or publication
>> of this message without the prior written consent of authorized
>> representative of HCL is strictly prohibited. If you have received
>> this email in error please delete it and notify the sender
>> immediately.
>> Before opening any email and/or attachments, please check them for
>> viruses and other defects.
>>
>>
>> ---------------------------------------------------------------------
>> ---------------------------------------------------------------------
>> ----------
>>
>
>