yeah sorry Jeff, 
i neglected to say that i am trying to clusters a set of 1400 text documents
from a directory and i'm not using the synthetic dataset. here are the
commands i used to create the vectors
the input data i.e. data/trecdata is a directory of raw text files

i'll run the clustering on the synthetic dataset to see if there is
something wrong with the input vectors.

./mahout seqdirectory 
-i /data/trecdata 
-o /data/trecdata-seqfiles 
-c ascii 
-chunk 64 
-prefix TREC

and then to create the sparse matrix
./mahout seq2sparse 
-s 2 
-a org.apache.lucene.analysis.standard.StandardAnalyzer 
-chunk 100 
-i /home/w007dhc/data/trecdata-seqfiles/chunk-0 
-o /home/w007dhc/data/trecdata-vectors 
-md 1 -x 75 -wt TFIDF -n 0 -w



-----
--cheers
Delroy
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/k-means-invocation-exception-still-not-resolved-tp835261p835632.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Reply via email to