I'm using Mahout 0.5.I am using Lucene ( the matching version in the pom.xml) to index a tiny data set for testing. This is what the index looks like: _0.fdt _0.fnm _0.nrm _0.tii _0.tvd _0.tvx segments.gen _0.fdx _0.frq _0.prx _0.tis _0.tvf segments_1
Now I use this command to create vectors the from Lucene Index ( same as the wiki command) ./mahout lucene.vector --dir /home/varun/myindex/ --field title --dictOut /home/varun/myindex/dict.txt --output /home/varun/myindex/out.txt --norm 1 Now I copy paste the /myindex folder to the /bin/testdata folder as that seems to be the default dir. for the data To run K-means I use this command: ./mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job This is the error which I get: http://pastebin.com/ADPm0Vbx Am I missing any steps? Also on a side note is there a post on using MinHash in Mahout? -- Regards, Varun Thacker http://varunthacker.wordpress.com
