also i think I am seeing the issues DB mentioned to me yesterday at sparkML
meetup (or something similar).


On Wed, Aug 7, 2013 at 1:01 PM, Dmitriy Lyubimov <[email protected]> wrote:

> Thanks, Stuti.
>
> ok, i think there's something  indeed going on with PCA stuff. it may
> require a patch.
>
> -d
>
>
> On Wed, Aug 7, 2013 at 2:12 AM, Stuti Awasthi <[email protected]>wrote:
>
>> I have not used -q option while running ssvd. Here are the commands which
>> I run :
>>
>> Seq2sparse :
>> mahout-distribution-0.7/bin/mahout seq2sparse -i /stuti/SSVD/data-seq -o
>> /stuti/SSVD/data-vectors -s 5 -ml 50 -nv -ng 3 -n 2 -x 70
>>
>> SSVD
>> hadoop jar mahout-distribution-0.7/mahout-core-0.8-job.jar
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli -i
>> /stuti/SSVD/data-vectors/tf-vectors -o /stuti/SSVD/Output -k 90 -U true -V
>> true --reduceTasks 1
>>
>> Thanks
>> Stuti Awasthi
>>
>> -----Original Message-----
>> From: Dmitriy Lyubimov [mailto:[email protected]]
>> Sent: Wednesday, August 07, 2013 2:14 PM
>> To: [email protected]
>> Subject: RE: How to SSVD output to generate Clusters
>>
>> Thanks Stuti. Yes it looks like it is not there. Let me run a test. One
>> question . Did you use -q 0 or 1 or sometni g elw   se?
>> On Aug 7, 2013 12:18 AM, "Stuti Awasthi" <[email protected]> wrote:
>>
>> > Hey Dmitriy,
>> >
>> > Sorry for replying late.
>> > I have re-run the steps to generate U. Here are the output:
>> >
>> > 1. "seq2sparse" Output: Contains Named Vector
>> > Key: /Description_10: Value:
>> > /Description_10:{554:1.0,54:1.0,514:1.0,322:1.0,247:1.0,91:1.0,127:1.0
>> > ,456:1.0,480:1.0,713:1.0,674:1.0,117:1.0,461:1.0,595:2.0,446:1.0,296:2
>> > .0}
>> >
>> > 2. "ssvd" Output of U: Not Contained NamedVector
>> > Key: /Description_10: Value:
>> >
>> {0:0.010564205396743468,1:-0.01989403962804719,2:0.015640314765729225,3:0.04031717183780774,4:-0.03325995251075869,5:0.0294201152018514,6:
>> >
>> > 0.03834130611889856,7:-0.008686421005328312,8:-0.06164515883823538,9:0
>> > .03752875772953153,10:0.04739786931946798,11:-0.07912744917669134,12:0
>> > .020078421275704143,13:-0.0
>> >
>> > 4017504785907734,14:0.012539132502559502,15:0.0733073647645918,16:-0.0
>> > 2111033727307056,17:0.0799478317610193,18:-0.08481960414593219,19:-0.0
>> > 6987848875856222,20:0.03693
>> >
>> > 2920091059446,21:-0.06949180571421532,22:-0.03447267994522256,23:-0.07
>> > 104196347181493,24:0.022262180555421562,25:-0.0485632586340187,26:-0.0
>> > 5380823388650383,27:0.09299
>> >
>> > 533887785207,28:0.0019344239524856396,29:0.002936116541403362,30:-0.07
>> > 249587007236825,31:0.0016026176038041033,32:-0.0711115256224166,33:0.0
>> > 6603931206284432,34:0.01922
>> >
>> > 6806201249697,35:0.13972781245330326,36:0.0787696939450401,37:0.070653
>> > 56340476747,38:0.08437107545490818,39:0.06381670380272558,40:0.0464059
>> > 64753673735,41:0.0601332388
>> >
>> > 594578,42:-0.12996454299711707,43:0.10779361589915878,44:-0.0652470275
>> > 4474347,45:0.014785171162887613,46:-0.036630574690084586,47:-0.1506665
>> > 6149902793,48:0.16190482591
>> > 405958,49:-0.00869851116149916}
>> >
>> > As you said that propagation of keys to keys or/and names to names
>> > should happen . Any idea what's going wrong or if there is a mistake
>> from my side ?
>> >
>> > Thanks
>> > Stuti Awasthi
>> >
>> > -----Original Message-----
>> > From: Dmitriy Lyubimov [mailto:[email protected]]
>> > Sent: Friday, August 02, 2013 11:39 PM
>> > To: [email protected]
>> > Subject: Re: How to SSVD output to generate Clusters
>> >
>> > by eyeballing the code, i think i don' t see a problem. if rows of A
>> > are named values, then row of U (or U*Sigma or U*Sigma^1/2) would also
>> > retain names from values of rows of A. Output would not contain
>> > NamedVector values for rows that were not NamedVector values
>> > themselves in the input. Does seq2sparse output create NamedVectors as
>> values ?
>> >
>> > Note that if what you want is to have *keys* from from seq2sparse
>> > (such as
>> > 'file_1') propagated to name of named vector value in U, that much is
>> > not happening. The algorithm propagates keys to keys or/and names to
>> > names (but not any other combinations of those).
>> >
>> > -d
>> >
>> >
>> > On Fri, Aug 2, 2013 at 10:42 AM, Dmitriy Lyubimov <[email protected]>
>> > wrote:
>> >
>> > > it should. i worked on the issue and last time it was checked it was
>> > > still working with name propagation. if not, then it is a bug
>> > >
>> > >
>> > > On Fri, Aug 2, 2013 at 3:33 AM, Stuti Awasthi <[email protected]
>> > >wrote:
>> > >
>> > >> Hey Ted,
>> > >>
>> > >> As suggested, I tried SSVD with Mahout 0.8. I think the issue of
>> > >> NamedVector not propagating to the output U ,still persists.
>> > >> Here is what I have done :
>> > >>
>> > >> 1. Created featureVector using "seq2sparse" with -nv option.
>> > >> Checked the output, named vector created.
>> > >> 2. Provided this featureVector to "ssvd" with params " -k 100 -U
>> > >> true -V true". After execution, 3 output got generated namely :
>> > >> sigma, U, V 3. I dumped "U" to check the output if it contained
>> namedVectors:
>> > >>
>> > >> mahout-distribution-0.7/bin/mahout seqdumper -i
>> > >> /stuti/SSVD/Output/U
>> > >> | more
>> > >> Output:
>> > >> Key: /File_1: Value:
>> > >> {0:0.027019746696983288,1:0.006124424321845726,2:0.0334311500858222,.
>> > >> ....}
>> > >>
>> > >> I did not see the NamedVector getting created in the output of ssvd.
>> > >> Please point out if I have missed any step in between.
>> > >>
>> > >> As I wanted to perform the Clustering, I took the output of "U" and
>> > >> generated the NamedVector with custom code. The output looks like
>> this :
>> > >> Key: /File_1: Value:
>> > >> /File_1:{0:0.027019746696983288,1:0.006124424321845726,2:0.03343115
>> > >> 00
>> > >> 858222,.....}
>> > >>
>> > >> Then I fed this namedvector file to KMeans to generate 10 Clusters.
>> > >> In this I have used Random Centroid selection with KMeans.
>> > >> Finally I dumped the ClusterOutput as :
>> > >> <ClusterId>,<DocumentID1>,<DocumentId2>.....
>> > >>
>> > >> Please let me know if I have performed any mistake in the end to
>> > >> end execution as well Im not sure Why SSVD output is not generating
>> > >> the named vectors as the issue id fixed..
>> > >>
>> > >> Please suggest
>> > >>
>> > >> Regards
>> > >> Stuti Awasthi
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> -----Original Message-----
>> > >> From: Ted Dunning [mailto:[email protected]]
>> > >> Sent: Thursday, August 01, 2013 8:37 PM
>> > >> To: [email protected]
>> > >> Subject: Re: How to SSVD output to generate Clusters
>> > >>
>> > >> On Thu, Aug 1, 2013 at 5:49 AM, Stuti Awasthi
>> > >> <[email protected]>
>> > >> wrote:
>> > >>
>> > >> > I think there is a problem because of NamedVector as after some
>> > >> > search I get this Jira.
>> > >> > https://issues.apache.org/jira/browse/MAHOUT-1067
>> > >> >
>> > >>
>> > >> Note also that this bug is fixed in 0.8
>> > >>
>> > >>
>> > >> ::DISCLAIMER::
>> > >>
>> > >> -------------------------------------------------------------------
>> > >> --
>> > >> -------------------------------------------------------------------
>> > >> --
>> > >> ----------
>> > >>
>> > >> The contents of this e-mail and any attachment(s) are confidential
>> > >> and intended for the named recipient(s) only.
>> > >> E-mail transmission is not guaranteed to be secure or error-free as
>> > >> information could be intercepted, corrupted, lost, destroyed,
>> > >> arrive late or incomplete, or may contain viruses in transmission.
>> > >> The e mail and its contents (with or without referred errors) shall
>> > >> therefore not attach any liability on the originator or HCL or its
>> > >> affiliates.
>> > >> Views or opinions, if any, presented in this email are solely those
>> > >> of the author and may not necessarily reflect the views or opinions
>> > >> of HCL or its affiliates. Any form of reproduction, dissemination,
>> > >> copying, disclosure, modification, distribution and / or
>> > >> publication of this message without the prior written consent of
>> > >> authorized representative of HCL is strictly prohibited. If you
>> > >> have received this email in error please delete it and notify the
>> > >> sender immediately.
>> > >> Before opening any email and/or attachments, please check them for
>> > >> viruses and other defects.
>> > >>
>> > >>
>> > >> -------------------------------------------------------------------
>> > >> --
>> > >> -------------------------------------------------------------------
>> > >> --
>> > >> ----------
>> > >>
>> > >
>> > >
>> >
>>
>
>

Reply via email to