Looks like your input data is not numeric: "����2Yu�3.1_0�����������������osLinux". The InputMapper is barfing trying to convert this into a double.
-----Original Message----- From: Varun Thacker [mailto:[email protected]] Sent: Tuesday, September 13, 2011 10:09 AM To: [email protected] Subject: Re: Error while running any clustering tasks Any pointers on where I should be looking at ? On Sun, Sep 11, 2011 at 11:03 PM, Varun Thacker <[email protected]>wrote: > I'm using Mahout 0.5.I am using Lucene ( the matching version in the > pom.xml) to index a tiny data set for testing. This is what the index looks > like: > _0.fdt > _0.fnm > _0.nrm > _0.tii > _0.tvd > _0.tvx > segments.gen > _0.fdx > _0.frq > _0.prx > _0.tis > _0.tvf > segments_1 > > Now I use this command to create vectors the from Lucene Index ( same as > the wiki command) > > ./mahout lucene.vector --dir /home/varun/myindex/ --field title --dictOut > /home/varun/myindex/dict.txt --output /home/varun/myindex/out.txt --norm 1 > > Now I copy paste the /myindex folder to the /bin/testdata folder as that > seems to be the default dir. for the data > > To run K-means I use this command: > ./mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job > > This is the error which I get: http://pastebin.com/ADPm0Vbx > > Am I missing any steps? > > Also on a side note is there a post on using MinHash in Mahout? > > > -- > > > Regards, > Varun Thacker > http://varunthacker.wordpress.com > > > -- Regards, Varun Thacker http://varunthacker.wordpress.com
