yeah sorry Jeff, i neglected to say that i am trying to clusters a set of 1400 text documents from a directory and i'm not using the synthetic dataset. here are the commands i used to create the vectors the input data i.e. data/trecdata is a directory of raw text files
i'll run the clustering on the synthetic dataset to see if there is something wrong with the input vectors. ./mahout seqdirectory -i /data/trecdata -o /data/trecdata-seqfiles -c ascii -chunk 64 -prefix TREC and then to create the sparse matrix ./mahout seq2sparse -s 2 -a org.apache.lucene.analysis.standard.StandardAnalyzer -chunk 100 -i /home/w007dhc/data/trecdata-seqfiles/chunk-0 -o /home/w007dhc/data/trecdata-vectors -md 1 -x 75 -wt TFIDF -n 0 -w ----- --cheers Delroy -- View this message in context: http://lucene.472066.n3.nabble.com/k-means-invocation-exception-still-not-resolved-tp835261p835632.html Sent from the Mahout User List mailing list archive at Nabble.com.
