On Wed, Aug 7, 2013 at 10:44 PM, Dmitriy Lyubimov <[email protected]> wrote:
> @Stuti: > > Ok, so I added additional tests (there was a small bug in the test) and > actually in local tests the name keys and named vector names are both > propagating on all execution paths. So i cannot confirm problem of > no-propagation. (this is on 0.9 trunk). > This test mod is in MAHOUT-1306, already pushed to trunk. I know this probably doesn't resolve your particular situation but I don't see a problem with SSVD itself wrt name propagation --ether key or named vector values. I know Pat was doing this successfully.Let me know if you find something. > > Sent: Friday, August 02, 2013 11:39 PM >>>>> > To: [email protected] >>>>> > Subject: Re: How to SSVD output to generate Clusters >>>>> > >>>>> > by eyeballing the code, i think i don' t see a problem. if rows of A >>>>> > are named values, then row of U (or U*Sigma or U*Sigma^1/2) would >>>>> also >>>>> > retain names from values of rows of A. Output would not contain >>>>> > NamedVector values for rows that were not NamedVector values >>>>> > themselves in the input. Does seq2sparse output create NamedVectors >>>>> as values ? >>>>> > >>>>> > Note that if what you want is to have *keys* from from seq2sparse >>>>> > (such as >>>>> > 'file_1') propagated to name of named vector value in U, that much is >>>>> > not happening. The algorithm propagates keys to keys or/and names to >>>>> > names (but not any other combinations of those). >>>>> > >>>>> > -d >>>>> > >>>>> > >>>>> > On Fri, Aug 2, 2013 at 10:42 AM, Dmitriy Lyubimov <[email protected] >>>>> > >>>>> > wrote: >>>>> > >>>>> > > it should. i worked on the issue and last time it was checked it >>>>> was >>>>> > > still working with name propagation. if not, then it is a bug >>>>> > > >>>>> > > >>>>> > > On Fri, Aug 2, 2013 at 3:33 AM, Stuti Awasthi < >>>>> [email protected] >>>>> > >wrote: >>>>> > > >>>>> > >> Hey Ted, >>>>> > >> >>>>> > >> As suggested, I tried SSVD with Mahout 0.8. I think the issue of >>>>> > >> NamedVector not propagating to the output U ,still persists. >>>>> > >> Here is what I have done : >>>>> > >> >>>>> > >> 1. Created featureVector using "seq2sparse" with -nv option. >>>>> > >> Checked the output, named vector created. >>>>> > >> 2. Provided this featureVector to "ssvd" with params " -k 100 -U >>>>> > >> true -V true". After execution, 3 output got generated namely : >>>>> > >> sigma, U, V 3. I dumped "U" to check the output if it contained >>>>> namedVectors: >>>>> > >> >>>>> > >> mahout-distribution-0.7/bin/mahout seqdumper -i >>>>> > >> /stuti/SSVD/Output/U >>>>> > >> | more >>>>> > >> Output: >>>>> > >> Key: /File_1: Value: >>>>> > >> >>>>> {0:0.027019746696983288,1:0.006124424321845726,2:0.0334311500858222,. >>>>> > >> ....} >>>>> > >> >>>>> > >> I did not see the NamedVector getting created in the output of >>>>> ssvd. >>>>> > >> Please point out if I have missed any step in between. >>>>> > >> >>>>> > >> As I wanted to perform the Clustering, I took the output of "U" >>>>> and >>>>> > >> generated the NamedVector with custom code. The output looks like >>>>> this : >>>>> > >> Key: /File_1: Value: >>>>> > >> >>>>> /File_1:{0:0.027019746696983288,1:0.006124424321845726,2:0.03343115 >>>>> > >> 00 >>>>> > >> 858222,.....} >>>>> > >> >>>>> > >> Then I fed this namedvector file to KMeans to generate 10 >>>>> Clusters. >>>>> > >> In this I have used Random Centroid selection with KMeans. >>>>> > >> Finally I dumped the ClusterOutput as : >>>>> > >> <ClusterId>,<DocumentID1>,<DocumentId2>..... >>>>> > >> >>>>> > >> Please let me know if I have performed any mistake in the end to >>>>> > >> end execution as well Im not sure Why SSVD output is not >>>>> generating >>>>> > >> the named vectors as the issue id fixed.. >>>>> > >> >>>>> > >> Please suggest >>>>> > >> >>>>> > >> Regards >>>>> > >> Stuti Awasthi >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> -----Original Message----- >>>>> > >> From: Ted Dunning [mailto:[email protected]] >>>>> > >> Sent: Thursday, August 01, 2013 8:37 PM >>>>> > >> To: [email protected] >>>>> > >> Subject: Re: How to SSVD output to generate Clusters >>>>> > >> >>>>> > >> On Thu, Aug 1, 2013 at 5:49 AM, Stuti Awasthi >>>>> > >> <[email protected]> >>>>> > >> wrote: >>>>> > >> >>>>> > >> > I think there is a problem because of NamedVector as after some >>>>> > >> > search I get this Jira. >>>>> > >> > https://issues.apache.org/jira/browse/MAHOUT-1067 >>>>> > >> > >>>>> > >> >>>>> > >> Note also that this bug is fixed in 0.8 >>>>> > >> >>>>> > >> >>>>> > >> ::DISCLAIMER:: >>>>> > >> >>>>> > >> >>>>> ------------------------------------------------------------------- >>>>> > >> -- >>>>> > >> >>>>> ------------------------------------------------------------------- >>>>> > >> -- >>>>> > >> ---------- >>>>> > >> >>>>> > >> The contents of this e-mail and any attachment(s) are confidential >>>>> > >> and intended for the named recipient(s) only. >>>>> > >> E-mail transmission is not guaranteed to be secure or error-free >>>>> as >>>>> > >> information could be intercepted, corrupted, lost, destroyed, >>>>> > >> arrive late or incomplete, or may contain viruses in transmission. >>>>> > >> The e mail and its contents (with or without referred errors) >>>>> shall >>>>> > >> therefore not attach any liability on the originator or HCL or its >>>>> > >> affiliates. >>>>> > >> Views or opinions, if any, presented in this email are solely >>>>> those >>>>> > >> of the author and may not necessarily reflect the views or >>>>> opinions >>>>> > >> of HCL or its affiliates. Any form of reproduction, dissemination, >>>>> > >> copying, disclosure, modification, distribution and / or >>>>> > >> publication of this message without the prior written consent of >>>>> > >> authorized representative of HCL is strictly prohibited. If you >>>>> > >> have received this email in error please delete it and notify the >>>>> > >> sender immediately. >>>>> > >> Before opening any email and/or attachments, please check them for >>>>> > >> viruses and other defects. >>>>> > >> >>>>> > >> >>>>> > >> >>>>> ------------------------------------------------------------------- >>>>> > >> -- >>>>> > >> >>>>> ------------------------------------------------------------------- >>>>> > >> -- >>>>> > >> ---------- >>>>> > >> >>>>> > > >>>>> > > >>>>> > >>>>> >>>> >>>> >>> >> >
