I was able to run the tutorials, etc. Now I would like to generate my
own small test.

I have created a data.dat file and put these contents:
22 21
19 20
18 22
1 3
3 2

Then I ran: mahout seqdirectory -i ~/data/kmeans/data.dat -o kmeans/seqdir

This created kmeans/seqdir/chunk-o in my dfs with the following content:
¼/%
        /data.dat22 21
19 20
18 22
1 3
3 2

Next I ran:  mahout seq2sparse -i kmeans/seqdir -o kmeans/input

This generated several things in kmeans/input including the
'tfidf/vectors' folder. Inside the vectors folder I get: part-00000
which contains:
øÏân
        /data.dat7org.apache.mahout.math.RandomAccessSparseVectorWritable
     /data.dat@@

It does not seem to have the numeric data at this point.

I am hoping someone can shed some light on how I can get my datapoint
file into the proper vector format for running mahout kmeans.

Just fyi, when I run kmeans against that file (mahout kmeans -i
kmeans/input/tfidf/vectors -c kmeans/clusters -o kmeans/output -k 2
-w) I get:

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index:
1, Size: 1
        at java.util.ArrayList.RangeCheck(ArrayList.java:547)

which tells me it was unable to find even 1 vector in the given input folder.

Thanks for any comments you provide.
-M@
-- 
Have you thanked a teacher today? ---> http://www.liftateacher.org

Reply via email to