On Wed, Aug 7, 2013 at 10:44 PM, Dmitriy Lyubimov <[email protected]> wrote:

> @Stuti:
>
> Ok, so I added additional tests (there was a small bug in the test) and
> actually in local tests the name keys and named vector names are both
> propagating on all execution paths. So i cannot confirm problem of
> no-propagation. (this is on 0.9 trunk).
>

This test mod is in MAHOUT-1306, already pushed to trunk.

I know this probably doesn't resolve your particular situation but I don't
see a problem with SSVD itself wrt name propagation --ether key or named
vector values. I know Pat was doing this successfully.Let me know if you
find something.




> > Sent: Friday, August 02, 2013 11:39 PM
>>>>> > To: [email protected]
>>>>> > Subject: Re: How to SSVD output to generate Clusters
>>>>> >
>>>>> > by eyeballing the code, i think i don' t see a problem. if rows of A
>>>>> > are named values, then row of U (or U*Sigma or U*Sigma^1/2) would
>>>>> also
>>>>> > retain names from values of rows of A. Output would not contain
>>>>> > NamedVector values for rows that were not NamedVector values
>>>>> > themselves in the input. Does seq2sparse output create NamedVectors
>>>>> as values ?
>>>>> >
>>>>> > Note that if what you want is to have *keys* from from seq2sparse
>>>>> > (such as
>>>>> > 'file_1') propagated to name of named vector value in U, that much is
>>>>> > not happening. The algorithm propagates keys to keys or/and names to
>>>>> > names (but not any other combinations of those).
>>>>> >
>>>>> > -d
>>>>> >
>>>>> >
>>>>> > On Fri, Aug 2, 2013 at 10:42 AM, Dmitriy Lyubimov <[email protected]
>>>>> >
>>>>> > wrote:
>>>>> >
>>>>> > > it should. i worked on the issue and last time it was checked it
>>>>> was
>>>>> > > still working with name propagation. if not, then it is a bug
>>>>> > >
>>>>> > >
>>>>> > > On Fri, Aug 2, 2013 at 3:33 AM, Stuti Awasthi <
>>>>> [email protected]
>>>>> > >wrote:
>>>>> > >
>>>>> > >> Hey Ted,
>>>>> > >>
>>>>> > >> As suggested, I tried SSVD with Mahout 0.8. I think the issue of
>>>>> > >> NamedVector not propagating to the output U ,still persists.
>>>>> > >> Here is what I have done :
>>>>> > >>
>>>>> > >> 1. Created featureVector using "seq2sparse" with -nv option.
>>>>> > >> Checked the output, named vector created.
>>>>> > >> 2. Provided this featureVector to "ssvd" with params " -k 100 -U
>>>>> > >> true -V true". After execution, 3 output got generated namely :
>>>>> > >> sigma, U, V 3. I dumped "U" to check the output if it contained
>>>>> namedVectors:
>>>>> > >>
>>>>> > >> mahout-distribution-0.7/bin/mahout seqdumper -i
>>>>> > >> /stuti/SSVD/Output/U
>>>>> > >> | more
>>>>> > >> Output:
>>>>> > >> Key: /File_1: Value:
>>>>> > >>
>>>>> {0:0.027019746696983288,1:0.006124424321845726,2:0.0334311500858222,.
>>>>> > >> ....}
>>>>> > >>
>>>>> > >> I did not see the NamedVector getting created in the output of
>>>>> ssvd.
>>>>> > >> Please point out if I have missed any step in between.
>>>>> > >>
>>>>> > >> As I wanted to perform the Clustering, I took the output of "U"
>>>>> and
>>>>> > >> generated the NamedVector with custom code. The output looks like
>>>>> this :
>>>>> > >> Key: /File_1: Value:
>>>>> > >>
>>>>> /File_1:{0:0.027019746696983288,1:0.006124424321845726,2:0.03343115
>>>>> > >> 00
>>>>> > >> 858222,.....}
>>>>> > >>
>>>>> > >> Then I fed this namedvector file to KMeans to generate 10
>>>>> Clusters.
>>>>> > >> In this I have used Random Centroid selection with KMeans.
>>>>> > >> Finally I dumped the ClusterOutput as :
>>>>> > >> <ClusterId>,<DocumentID1>,<DocumentId2>.....
>>>>> > >>
>>>>> > >> Please let me know if I have performed any mistake in the end to
>>>>> > >> end execution as well Im not sure Why SSVD output is not
>>>>> generating
>>>>> > >> the named vectors as the issue id fixed..
>>>>> > >>
>>>>> > >> Please suggest
>>>>> > >>
>>>>> > >> Regards
>>>>> > >> Stuti Awasthi
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >> -----Original Message-----
>>>>> > >> From: Ted Dunning [mailto:[email protected]]
>>>>> > >> Sent: Thursday, August 01, 2013 8:37 PM
>>>>> > >> To: [email protected]
>>>>> > >> Subject: Re: How to SSVD output to generate Clusters
>>>>> > >>
>>>>> > >> On Thu, Aug 1, 2013 at 5:49 AM, Stuti Awasthi
>>>>> > >> <[email protected]>
>>>>> > >> wrote:
>>>>> > >>
>>>>> > >> > I think there is a problem because of NamedVector as after some
>>>>> > >> > search I get this Jira.
>>>>> > >> > https://issues.apache.org/jira/browse/MAHOUT-1067
>>>>> > >> >
>>>>> > >>
>>>>> > >> Note also that this bug is fixed in 0.8
>>>>> > >>
>>>>> > >>
>>>>> > >> ::DISCLAIMER::
>>>>> > >>
>>>>> > >>
>>>>> -------------------------------------------------------------------
>>>>> > >> --
>>>>> > >>
>>>>> -------------------------------------------------------------------
>>>>> > >> --
>>>>> > >> ----------
>>>>> > >>
>>>>> > >> The contents of this e-mail and any attachment(s) are confidential
>>>>> > >> and intended for the named recipient(s) only.
>>>>> > >> E-mail transmission is not guaranteed to be secure or error-free
>>>>> as
>>>>> > >> information could be intercepted, corrupted, lost, destroyed,
>>>>> > >> arrive late or incomplete, or may contain viruses in transmission.
>>>>> > >> The e mail and its contents (with or without referred errors)
>>>>> shall
>>>>> > >> therefore not attach any liability on the originator or HCL or its
>>>>> > >> affiliates.
>>>>> > >> Views or opinions, if any, presented in this email are solely
>>>>> those
>>>>> > >> of the author and may not necessarily reflect the views or
>>>>> opinions
>>>>> > >> of HCL or its affiliates. Any form of reproduction, dissemination,
>>>>> > >> copying, disclosure, modification, distribution and / or
>>>>> > >> publication of this message without the prior written consent of
>>>>> > >> authorized representative of HCL is strictly prohibited. If you
>>>>> > >> have received this email in error please delete it and notify the
>>>>> > >> sender immediately.
>>>>> > >> Before opening any email and/or attachments, please check them for
>>>>> > >> viruses and other defects.
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> -------------------------------------------------------------------
>>>>> > >> --
>>>>> > >>
>>>>> -------------------------------------------------------------------
>>>>> > >> --
>>>>> > >> ----------
>>>>> > >>
>>>>> > >
>>>>> > >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to