Yes, how to run canopy/ kmeans on usigma output? What is the connecting step? Please update on the same.
Thanks, Rajesh On May 30, 2013 10:09 PM, "Dmitriy Lyubimov" <[email protected]> wrote: > I.e. i guess you want to run kmeans directly on usigma output. > On May 30, 2013 9:37 AM, "Dmitriy Lyubimov" <[email protected]> wrote: > > > I believe this flow describes how to use lanczos svd in mahout to arrive > > at the same reduction as ssvd already provides with pca and USigma > options > > in one step. This flow is irrelevant when working with ssvd, it already > > does it all internally for you. > > On May 30, 2013 5:45 AM, "Rajesh Nikam" <[email protected]> wrote: > > > >> Hi Suneel/Dmitriy, > >> > >> I got mahout-examples-0.8-SNAPSHOT-job.jar compiled from trunk. > >> Now I got -us param as your mentioned for the input set working. > >> > >> Steps followed are: > >> > >> mahout arff.vector --input /mnt/cluster/t/PE_EXE/input-set.arff --output > >> /user/hadoop/t/input-set-vector/ --dictOut /mnt/cluster/t/input-set-dict > >> > >> hadoop jar mahout-examples-0.8-SNAPSHOT-job.jar > >> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli --input > >> /user/hadoop/t/input-set-vector/ --output /user/hadoop/t/input-set-svd/ > -k > >> 50 --reduceTasks 2 -U true -V false -us true -ow > >> > >> Not able to understand what needs to be provided input to > >> cleansvd/transpose/matrixmult as mentioned on following page, what needs > >> to > >> be used U/V/USigma and how. > >> > >> Again how to understand which features got in reduced matrix. > >> > >> https://cwiki.apache.org/MAHOUT/dimensional-reduction.html > >> > >> At a high level, the steps we're going to perform are: > >> > >> bin/mahout svd (original -> svdOut) > >> bin/mahout cleansvd ... > >> bin/mahout transpose svdOut -> svdT > >> bin/mahout transpose original -> originalT > >> bin/mahout matrixmult originalT svdT -> newMatrix > >> bin/mahout kmeans newMatrix > >> > >> Thanks, > >> > >> Rajesh > >> > >> > >> > >> On Mon, May 27, 2013 at 11:31 AM, Suneel Marthi < > [email protected] > >> >wrote: > >> > >> > Ahha, I see your problem now. > >> > > >> > The additional line in trunk was added as part of Mahout-1097 (long > >> after > >> > Mahout-0.7 release) and hence you wouldn't see the change in > >> > mahout-examples-0.7-job.jar that you are working off of. This fix is > >> > presently available in trunk (and will be part of Mahout-0.8). > >> > > >> > I would recommend to work off of trunk for now and u should be good. > >> > > >> > > >> > > >> > > >> > ________________________________ > >> > From: Rajesh Nikam <[email protected]> > >> > To: [email protected] > >> > Sent: Monday, May 27, 2013 1:52 AM > >> > Subject: Re: Fwd: Re: convert input for SVD > >> > > >> > > >> > Hi Dmitriy / Suneel, > >> > > >> > You are pointing me to the correct solution. However I see difference > >> > options in source code downloaded from (mahout-trunk.zip) and > >> > mahout-examples-0.7-job.jar. > >> > > >> > Could you please verify the same at your end. > >> > > >> > ==>> from mahout-trunk.zip <<== > >> > > >> > addOption("uHalfSigma", > >> > "uhs", > >> > "Compute U * Sigma^0.5", > >> > String.valueOf(false)); > >> > * addOption("uSigma", "us", "Compute U * Sigma", > >> > String.valueOf(false));* > >> > addOption("computeV", "V", "compute V (true/false)", > >> > String.valueOf(true)); > >> > > >> > > >> > ==>> mahout-examples-0.7-job.jar <<== > >> > > >> > addOption("uHalfSigma", "uhs", "Compute U as UHat=U x > >> pow(Sigma,0.5)", > >> > String.valueOf(false)); > >> > > >> > addOption("computeV", "V", "compute V (true/false)", > >> > String.valueOf(true)); > >> > addOption("vHalfSigma", "vhs", "compute V as VHat= V x > >> pow(Sigma,0.5)", > >> > String.valueOf(false)); > >> > > >> > > >> > Thanks, > >> > Rajesh > >> > > >> > > >> > On Fri, May 24, 2013 at 10:48 PM, Dmitriy Lyubimov <[email protected] > >> > >wrote: > >> > > >> > > "ssvd -us true...." should do this . Suneel says it still works on > >> trunk. > >> > > > >> > > > >> > > On Fri, May 24, 2013 at 9:38 AM, Rajesh Nikam < > [email protected]> > >> > > wrote: > >> > > > >> > > > Thanks Dmitriy & Suneel for comments. As you suggested I need to > >> use U > >> > * > >> > > > Sigma. > >> > > > > >> > > > It means Need to get multiplication of these matrices. > >> > > > > >> > > > Which Mahout props to use for this? > >> > > > > >> > > > Other question was how to get features that are selected in U? > >> > > > On May 24, 2013 8:45 PM, "Suneel Marthi" <[email protected] > > > >> > > wrote: > >> > > > > >> > > > > Rajesh, > >> > > > > > >> > > > > I am working off of trunk and this works fine. > >> > > > > > >> > > > > As Dmitriy says u do need USigma. > >> > > > > > >> > > > > It would help to paste the entire stacktrace you are seeing with > >> > > > > MatrixColumnMeansJob. > >> > > > > > >> > > > > If you are still seeing an issue, I would suggest that you work > >> off > >> > of > >> > > > > trunk. > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > ________________________________ > >> > > > > From: Dmitriy Lyubimov <[email protected]> > >> > > > > To: [email protected] > >> > > > > Sent: Friday, May 24, 2013 9:52 AM > >> > > > > Subject: Re: Fwd: Re: convert input for SVD > >> > > > > > >> > > > > > >> > > > > I think last time i verified this flow was as of > >> > > > > https://issues.apache.org/jira/browse/MAHOUT-1097. It was > woking > >> > then. > >> > > > Did > >> > > > > not look at it since. > >> > > > > On May 24, 2013 6:42 AM, "Dmitriy Lyubimov" <[email protected]> > >> > wrote: > >> > > > > > >> > > > > > Rajesh, you will get more help if you stay on the list. > >> > > > > > > >> > > > > > you do need u *sigma output. there is no substitute. > >> > > > > > > >> > > > > > If this option is indeed no longer there, i have no knowledge > of > >> > it. > >> > > > > Maybe > >> > > > > > there was some work committed that screwed that but at the > >> moment > >> > i > >> > > > have > >> > > > > > no time to look at it. Obviously it was there at the time > >> > > documentation > >> > > > > was > >> > > > > > written. I guess you may obtain an earlier snapshot as interim > >> > > solution > >> > > > > if > >> > > > > > it is indeed the case. > >> > > > > > > >> > > > > > ---------- Forwarded message ---------- > >> > > > > > From: "Rajesh Nikam" <[email protected]> > >> > > > > > Date: May 24, 2013 3:20 AM > >> > > > > > Subject: Re: convert input for SVD > >> > > > > > To: <[email protected]> > >> > > > > > Cc: > >> > > > > > > >> > > > > > > Hello Dmitriy, > >> > > > > > > > >> > > > > > > Thanks for reply. > >> > > > > > > > >> > > > > > > I see similar discussion on following link where I see your > >> > reply. > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > http://www.searchworkings.org/forum/-/message_boards/view_message/517870#_19_message_519704 > >> > > > > > > > >> > > > > > > I do also have same problem, need to apply dimensionality > >> > reduction > >> > > > and > >> > > > > > use > >> > > > > > > clustering algo on reduced features. > >> > > > > > > > >> > > > > > > Seems parameters for ssvd are changed from mentioned in > >> > > SSVD-CLI.pdf. > >> > > > > It > >> > > > > > no > >> > > > > > > longer shows *-us *as parameter > >> > > > > > > > >> > > > > > > I am using mahout-examples-0.7-job.jar > >> > > > > > > > >> > > > > > > mahout ssvd --input /user/hadoop/t/input-set-vector/ > --output > >> > > > > > > /user/hadoop/t/input-set-svd/ -k 200 --reduceTasks 2 -pca > >> true -U > >> > > > true > >> > > > > -V > >> > > > > > > false *-us true* -ow -q 1 > >> > > > > > > > >> > > > > > > giving option as "*-pca true*" gives error as > >> > > > > > > > >> > > > > > > at > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) > >> > > > > > > at > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) > >> > > > > > > > >> > > > > > > So I removed it. > >> > > > > > > > >> > > > > > > mahout ssvd --input /user/hadoop/t/input-set-vector/ > --output > >> > > > > > > /user/hadoop/t/input-set-svd/ -k 200 --reduceTasks 2 -U true > >> -V > >> > > false > >> > > > > > *-us > >> > > > > > > true* -ow -q 1 > >> > > > > > > > >> > > > > > > *>> *with above command *>> Unexpected -us *while processing > >> > > > > Job-Specific > >> > > > > > > Options. > >> > > > > > > > >> > > > > > > I tried with "-U false -V false -uhs true" it just generated > >> > sigma > >> > > > file > >> > > > > > as > >> > > > > > > expected however no "Usigma" > >> > > > > > > > >> > > > > > > hadoop fs -lsr /user/hadoop/t/PE_EXE/input-set-svd/ > >> > > > > > > > >> > > > > > > -rw-r--r-- 2 hadoop supergroup 1712 2013-05-24 15:34 > >> > > > > > > /user/hadoop/t/PE_EXE/input-set-svd/sigma > >> > > > > > > > >> > > > > > > Then with *"-U true -V false -uhs true" *output dir U is > >> created. > >> > > > > > > * > >> > > > > > > *drwxr-xr-x - hadoop supergroup 0 2013-05-24 > 15:39 > >> > > > > > > /user/hadoop/t/PE_EXE/input-set-svd/U > >> > > > > > > -rw-r--r-- 2 hadoop supergroup 1712 2013-05-24 15:39 > >> > > > > > > /user/hadoop/t/PE_EXE/input-set-svd/sigma* > >> > > > > > > * > >> > > > > > > > >> > > > > > > My problem is how to use these U/V/sigma file as input to > >> > > > > canopy/kmeans ? > >> > > > > > > > >> > > > > > > How to identify which important features from U/Sigma that > are > >> > > > retained > >> > > > > > in > >> > > > > > > dimensionality reduction ? > >> > > > > > > > >> > > > > > > Thanks in Advance ! > >> > > > > > > Rajesh > >> > > > > > > > >> > > > > > > > >> > > > > > > On Fri, May 24, 2013 at 7:01 AM, Dmitriy Lyubimov < > >> > > [email protected] > >> > > > > > >> > > > > > wrote: > >> > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=17&modificationDate=1349999085000 > >> > > > > > > > : > >> > > > > > > > > >> > > > > > > > "In most cases where you might be looking to reduce > >> > > > > > > > dimensionality while retaining variance, you probably need > >> > > > > combination > >> > > > > > of > >> > > > > > > > options -pca true -U false -V > >> > > > > > > > false -us true. > >> > > > > > > > > >> > > > > > > > See ยง3 for details" > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > On Thu, May 23, 2013 at 6:24 PM, Dmitriy Lyubimov < > >> > > > [email protected] > >> > > > > > > >> > > > > > > > wrote: > >> > > > > > > > > >> > > > > > > > > Also, for the dimensionality reduction it is important > >> among > >> > > > other > >> > > > > > things > >> > > > > > > > > to re-center your input first, which is why you also > want > >> > "-pca > >> > > > > > true". > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > On Thu, May 23, 2013 at 6:23 PM, Dmitriy Lyubimov < > >> > > > > [email protected] > >> > > > > > > > >wrote: > >> > > > > > > > > > >> > > > > > > > >> did you specify -us option? SSVD by default produces > only > >> > U, V > >> > > > and > >> > > > > > > > Sigma. > >> > > > > > > > >> but it can produce more, e.g. U*Sigma, U*sqrt(Sigma) > >> etc. if > >> > > you > >> > > > > > ask for > >> > > > > > > > >> it. And, alternatively, you can suppress any of U, V > (you > >> > > can't > >> > > > > > suppress > >> > > > > > > > >> sigma but that doesn't cost anything in space anyway). > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> On Thu, May 23, 2013 at 6:20 PM, Rajesh Nikam < > >> > > > > > [email protected] > >> > > > > > > > >wrote: > >> > > > > > > > >> > >> > > > > > > > >>> I got all three U, V & sigma from ssvd, however which > to > >> > use > >> > > as > >> > > > > > input > >> > > > > > > > to > >> > > > > > > > >>> canopy? > >> > > > > > > > >>> On May 24, 2013 6:47 AM, "Dmitriy Lyubimov" < > >> > > [email protected] > >> > > > > > >> > > > > > wrote: > >> > > > > > > > >>> > >> > > > > > > > >>> > I think you want U*Sigma > >> > > > > > > > >>> > > >> > > > > > > > >>> > What you want is ssvd ... -pca true ... -us true ... > >> see > >> > > the > >> > > > > > manual > >> > > > > > > > >>> > > >> > > > > > > > >>> > > >> > > > > > > > >>> > > >> > > > > > > > >>> > > >> > > > > > > > >>> > On Thu, May 23, 2013 at 6:07 PM, Rajesh Nikam < > >> > > > > > [email protected] > >> > > > > > > > > > >> > > > > > > > >>> > wrote: > >> > > > > > > > >>> > > >> > > > > > > > >>> > > Sorry for confusion. Here number of clusters are > >> > decided > >> > > by > >> > > > > > canopy. > >> > > > > > > > >>> With > >> > > > > > > > >>> > > data as it has 60 to 70 clusters. > >> > > > > > > > >>> > > > >> > > > > > > > >>> > > My question is which part from ssvd output U, V, > >> Sigma > >> > > > should > >> > > > > > be > >> > > > > > > > >>> used as > >> > > > > > > > >>> > > input to canopy? > >> > > > > > > > >>> > > On May 24, 2013 3:56 AM, "Ted Dunning" < > >> > > > > [email protected] > >> > > > > > > > >> > > > > > > > >>> wrote: > >> > > > > > > > >>> > > > >> > > > > > > > >>> > > > Rajesh, > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > This is very confusing. > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > You have 1500 things that you are clustering > into > >> > more > >> > > > than > >> > > > > > 1400 > >> > > > > > > > >>> > > clusters. > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > There is no way for most of these clusters to > >> have >1 > >> > > > > member > >> > > > > > just > >> > > > > > > > >>> > because > >> > > > > > > > >>> > > > there aren't enough clusters compared to the > >> items. > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > Is there a typo here? > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > On Thu, May 23, 2013 at 5:34 AM, Rajesh Nikam < > >> > > > > > > > >>> [email protected]> > >> > > > > > > > >>> > > > wrote: > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > > Hi, > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > I have input test set of 1500 instances with > >> 1000+ > >> > > > > > features. I > >> > > > > > > > >>> want > >> > > > > > > > >>> > to > >> > > > > > > > >>> > > to > >> > > > > > > > >>> > > > > SVD to reduce features. I have followed > >> following > >> > > steps > >> > > > > > with > >> > > > > > > > >>> generate > >> > > > > > > > >>> > > > 1400+ > >> > > > > > > > >>> > > > > clusters 99% of clusters contain 1 instance :( > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > Please let me know what is wrong in below > steps > >> - > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > mahout arff.vector --input > >> > > > /mnt/cluster/t/input-set.arff > >> > > > > > > > --output > >> > > > > > > > >>> > > > > /user/hadoop/t/input-set-vector/ --dictOut > >> > > > > > > > >>> > > /mnt/cluster/t/input-set-dict > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > mahout ssvd --input > >> > /user/hadoop/t/input-set-vector/ > >> > > > > > --output > >> > > > > > > > >>> > > > > /user/hadoop/t/input-set-svd/ -k 200 > >> --reduceTasks > >> > 2 > >> > > > -ow > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > mahout canopy -i > >> */user/hadoop/t/input-set-svd/U* > >> > -o > >> > > > > > > > >>> > > > > /user/hadoop/t/input-set-canopy-centroids -dm > >> > > > > > > > >>> > > > > > >> > > > org.apache.mahout.common.distance.TanimotoDistanceMeasure > >> > > > > > *-t1 > >> > > > > > > > >>> 0.001 > >> > > > > > > > >>> > > -t2 > >> > > > > > > > >>> > > > > 0.002* > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > mahout kmeans -i > >> */user/hadoop/t/input-set-svd/U* > >> > -c > >> > > > > > > > >>> > > > > > >> > > > > /user/hadoop/t/input-set-canopy-centroids/clusters-0-final > >> > > > > > -cl > >> > > > > > > > -o > >> > > > > > > > >>> > > > > /user/hadoop/t/input-set-kmeans-clusters -ow > -x > >> 10 > >> > > -dm > >> > > > > > > > >>> > > > > > >> > > > org.apache.mahout.common.distance.TanimotoDistanceMeasure > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > mahout clusterdump -dt sequencefile -i > >> > > > > > > > >>> > > > > > >> > > > > /user/hadoop/t/input-set-kmeans-clusters/clusters-1-final/ > >> > > > > > -n > >> > > > > > > > 20 > >> > > > > > > > >>> -b > >> > > > > > > > >>> > 100 > >> > > > > > > > >>> > > > -o > >> > > > > > > > >>> > > > > /mnt/cluster/t/cdump-input-set.txt -p > >> > > > > > > > >>> > > > > > >> > > > /user/hadoop/t/input-set-kmeans-clusters/clusteredPoints/ > >> > > > > > > > >>> --evaluate > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > Thanks in advance ! > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > Rajesh > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > On Wed, May 22, 2013 at 2:18 AM, Dmitriy > >> Lyubimov < > >> > > > > > > > >>> [email protected] > >> > > > > > > > >>> > > > >> > > > > > > > >>> > > > > wrote: > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > > PPS As far as the tool for arff, i am > frankly > >> not > >> > > > sure. > >> > > > > > but > >> > > > > > > > it > >> > > > > > > > >>> > sounds > >> > > > > > > > >>> > > > > like > >> > > > > > > > >>> > > > > > you've already solved this. > >> > > > > > > > >>> > > > > > > >> > > > > > > > >>> > > > > > > >> > > > > > > > >>> > > > > > On Tue, May 21, 2013 at 1:41 PM, Dmitriy > >> > Lyubimov < > >> > > > > > > > >>> > [email protected] > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > > > wrote: > >> > > > > > > > >>> > > > > > > >> > > > > > > > >>> > > > > > > ps as far as U, V data "close to zero", > yes > >> > > that's > >> > > > > what > >> > > > > > > > you'd > >> > > > > > > > >>> > > expect. > >> > > > > > > > >>> > > > > > > > >> > > > > > > > >>> > > > > > > Here, by "close to zero" it still means > much > >> > > bigger > >> > > > > > than a > >> > > > > > > > >>> > rounding > >> > > > > > > > >>> > > > > error > >> > > > > > > > >>> > > > > > > of course. e.g. 1E-12 is indeed a small > >> number, > >> > > and > >> > > > > > 1E-16 > >> > > > > > > > to > >> > > > > > > > >>> > 1E-18 > >> > > > > > > > >>> > > > > would > >> > > > > > > > >>> > > > > > be > >> > > > > > > > >>> > > > > > > indeed "close to zero" for the purposes of > >> > > > > singularity. > >> > > > > > > > >>> > 1E-2..1E-5 > >> > > > > > > > >>> > > > are > >> > > > > > > > >>> > > > > > > actually quite "sizeable" numbers by the > >> scale > >> > > of > >> > > > > > IEEE 754 > >> > > > > > > > >>> > > > > arithmetics. > >> > > > > > > > >>> > > > > > > > >> > > > > > > > >>> > > > > > > U and V are orthonormal (which means their > >> > column > >> > > > > > vectors > >> > > > > > > > >>> have > >> > > > > > > > >>> > > > > euclidiean > >> > > > > > > > >>> > > > > > > norm of 1) . Note that for large m and n > >> (large > >> > > > > inputs) > >> > > > > > > > they > >> > > > > > > > >>> are > >> > > > > > > > >>> > > also > >> > > > > > > > >>> > > > > > > extremely skinny. The larger input is, the > >> > > smaller > >> > > > > the > >> > > > > > > > >>> element > >> > > > > > > > >>> > of U > >> > > > > > > > >>> > > > > > or/and > >> > > > > > > > >>> > > > > > > V is gonna be. > >> > > > > > > > >>> > > > > > > > >> > > > > > > > >>> > > > > > > > >> > > > > > > > >>> > > > > > > > >> > > > > > > > >>> > > > > > > On Tue, May 21, 2013 at 8:48 AM, Dmitriy > >> > > Lyubimov < > >> > > > > > > > >>> > > [email protected] > >> > > > > > > > >>> > > > > > >wrote: > >> > > > > > > > >>> > > > > > > > >> > > > > > > > >>> > > > > > >> Sounds like dimensionality reduction to > me. > >> > You > >> > > > may > >> > > > > > want > >> > > > > > > > to > >> > > > > > > > >>> use > >> > > > > > > > >>> > > ssvd > >> > > > > > > > >>> > > > > > -pca > >> > > > > > > > >>> > > > > > >> > >> > > > > > > > >>> > > > > > >> Apologies for brevity. Sent from my > Android > >> > > phone. > >> > > > > > > > >>> > > > > > >> -Dmitriy > >> > > > > > > > >>> > > > > > >> On May 21, 2013 6:27 AM, "Rajesh Nikam" < > >> > > > > > > > >>> [email protected]> > >> > > > > > > > >>> > > > > wrote: > >> > > > > > > > >>> > > > > > >> > >> > > > > > > > >>> > > > > > >>> Hello Ted, > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> Thanks for reply. > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> I have started exploring SVD based on > its > >> > > mention > >> > > > > of > >> > > > > > > > could > >> > > > > > > > >>> help > >> > > > > > > > >>> > > to > >> > > > > > > > >>> > > > > drop > >> > > > > > > > >>> > > > > > >>> features which are not relevant for > >> > clustering. > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> My objective is reduce number of > features > >> > > before > >> > > > > > passing > >> > > > > > > > >>> them > >> > > > > > > > >>> > to > >> > > > > > > > >>> > > > > > >>> clustering > >> > > > > > > > >>> > > > > > >>> and just keep important features. > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> arff/csv==> ssvd (for dimensionality > >> > reduction) > >> > > > ==> > >> > > > > > > > >>> clustering > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> Could you please illustrate mahout props > >> to > >> > > join > >> > > > > > above > >> > > > > > > > >>> > pipeline. > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> I think, Lanczos SVD needs to be used > for > >> mxm > >> > > > > matrix. > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> I have tried check ssvd, I have used > >> > > arff.vector > >> > > > to > >> > > > > > > > covert > >> > > > > > > > >>> > > arff/csv > >> > > > > > > > >>> > > > > to > >> > > > > > > > >>> > > > > > >>> vector file which is then give as input > to > >> > ssvd > >> > > > and > >> > > > > > them > >> > > > > > > > >>> dumped > >> > > > > > > > >>> > > U, > >> > > > > > > > >>> > > > V > >> > > > > > > > >>> > > > > > and > >> > > > > > > > >>> > > > > > >>> sigma using vectordump. > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> I see most of the values dumped are near > >> to > >> > 0. > >> > > I > >> > > > > dont > >> > > > > > > > >>> > understand > >> > > > > > > > >>> > > is > >> > > > > > > > >>> > > > > > this > >> > > > > > > > >>> > > > > > >>> correct or not. > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > >> > > > > > > > >>> > > >> > > > > > > > >>> > >> > > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > {0:0.01066724825049657,1:0.016715498597386844,2:2.0187750952311708E-4,3:3.401020567221039E-4,4:-1.2388403347280688E-4,5:6.41502463540719E-5,6:-1.359187582538833E-4,7:6.329813140445419E-5,8:1.670015585746444E-4,9:3.5415113034592744E-4,10:7.108868213280763E-4,11:0.020553517552052456,12:-0.015118680942548916,13:0.007981746711271956,14:-0.003251236468768259,15:0.0038075014396303053,16:-0.0010925318534013683,17:-0.0026943024876179833,18:-0.001744794617721648,19:-0.0024528466548735714} > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > >> > > > > > > > >>> > > >> > > > > > > > >>> > >> > > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > {0:0.029978614322360833,1:-0.01431521245087889,2:1.3318592088199427E-4,3:1.495356283071516E-4,4:8.762709213918985E-5,5:1.2765191352425177E- > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> Thanks, > >> > > > > > > > >>> > > > > > >>> Rajesh > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> On Tue, May 21, 2013 at 11:35 AM, Ted > >> > Dunning < > >> > > > > > > > >>> > > > [email protected] > >> > > > > > > > >>> > > > > > > >> > > > > > > > >>> > > > > > >>> wrote: > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >>> > Are you using Lanczos instead of SSVD > >> for a > >> > > > > reason? > >> > > > > > > > >>> > > > > > >>> > > >> > > > > > > > >>> > > > > > >>> > > >> > > > > > > > >>> > > > > > >>> > > >> > > > > > > > >>> > > > > > >>> > > >> > > > > > > > >>> > > > > > >>> > On Mon, May 20, 2013 at 4:13 AM, > Rajesh > >> > > Nikam < > >> > > > > > > > >>> > > > > [email protected] > >> > > > > > > > >>> > > > > > > > >> > > > > > > > >>> > > > > > >>> > wrote: > >> > > > > > > > >>> > > > > > >>> > > >> > > > > > > > >>> > > > > > >>> > > Hello, > >> > > > > > > > >>> > > > > > >>> > > > >> > > > > > > > >>> > > > > > >>> > > I have arff / csv file containing > >> input > >> > > data > >> > > > > > that I > >> > > > > > > > >>> want to > >> > > > > > > > >>> > > > pass > >> > > > > > > > >>> > > > > to > >> > > > > > > > >>> > > > > > >>> svd : > >> > > > > > > > >>> > > > > > >>> > > Lanczos Singular Value > Decomposition. > >> > > > > > > > >>> > > > > > >>> > > > >> > > > > > > > >>> > > > > > >>> > > Which tool to use to convert it to > >> > required > >> > > > > > format ? > >> > > > > > > > >>> > > > > > >>> > > > >> > > > > > > > >>> > > > > > >>> > > Thanks in Advance ! > >> > > > > > > > >>> > > > > > >>> > > > >> > > > > > > > >>> > > > > > >>> > > Thanks, > >> > > > > > > > >>> > > > > > >>> > > Rajesh > >> > > > > > > > >>> > > > > > >>> > > > >> > > > > > > > >>> > > > > > >>> > > >> > > > > > > > >>> > > > > > >>> > >> > > > > > > > >>> > > > > > >> > >> > > > > > > > >>> > > > > > > > >> > > > > > > > >>> > > > > > > >> > > > > > > > >>> > > > > > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > >> > > > > > > > >>> > > >> > > > > > > > >>> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > >> > > > >> > > >> > > >
