On Sep 11, 2011, at 1:33 PM, Varun Thacker wrote: > I'm using Mahout 0.5.I am using Lucene ( the matching version in the > pom.xml) to index a tiny data set for testing. This is what the index looks > like: > _0.fdt > _0.fnm > _0.nrm > _0.tii > _0.tvd > _0.tvx > segments.gen > _0.fdx > _0.frq > _0.prx > _0.tis > _0.tvf > segments_1 > > Now I use this command to create vectors the from Lucene Index ( same as > the wiki command) > > ./mahout lucene.vector --dir /home/varun/myindex/ --field title --dictOut > /home/varun/myindex/dict.txt --output /home/varun/myindex/out.txt --norm 1 > > Now I copy paste the /myindex folder to the /bin/testdata folder as that > seems to be the default dir. for the data > > To run K-means I use this command: > ./mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
I don't think this is what you do to run K-means. You should be able to do: ./mahout kmeans --input ... > > This is the error which I get: http://pastebin.com/ADPm0Vbx > > Am I missing any steps? > > Also on a side note is there a post on using MinHash in Mahout? > > > -- > > > Regards, > Varun Thacker > http://varunthacker.wordpress.com -------------------------------------------- Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011: http://www.lucene-eurocon.com
