Re: k-means invocation exception still not resolved

Jeff Eastman Fri, 21 May 2010 17:01:41 -0700

Ok, your earlier post was about synthetic control and this clearly isn'tthat. When you run seq2sparse with the TFIDF option, the output vectorsare actually put into <output>/tfidf/vectors/, not <output> or even<output>/vectors/. I suggest you look at examples/bin/build-reuters.sh.When you do, you will see that the output file spec of seq2sparse was:


 -o ./examples/bin/work/reuters-out-seqdir-sparse


... and notice that the input file spec of kmeans follows the above pattern:

-i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf/vectors/



On 5/21/10 4:38 PM, Delroy Cameron wrote:

yeah sorry Jeff,
i neglected to say that i am trying to clusters a set of 1400 text documents
from a directory and i'm not using the synthetic dataset. here are the
commands i used to create the vectors
the input data i.e. data/trecdata is a directory of raw text files

i'll run the clustering on the synthetic dataset to see if there is
something wrong with the input vectors.

./mahout seqdirectory
-i /data/trecdata
-o /data/trecdata-seqfiles
-c ascii
-chunk 64
-prefix TREC

and then to create the sparse matrix
./mahout seq2sparse
-s 2
-a org.apache.lucene.analysis.standard.StandardAnalyzer
-chunk 100
-i /home/w007dhc/data/trecdata-seqfiles/chunk-0
-o /home/w007dhc/data/trecdata-vectors
-md 1 -x 75 -wt TFIDF -n 0 -w



-----
--cheers
Delroy

Re: k-means invocation exception still not resolved

Reply via email to