Hi everyone, I have tried several of the clustering algorithms in mahout and they worked great, but I have a problem with the cvd implementation of Latent Dirichlet Allocation. The cvb command works fine but then using clusterdump gives me the following error :
Exception in thread "main" java.lang.ClassCastException: org.apache.mahout.math.VectorWritable cannot be cast to org.apache.mahout.clustering.iterator.ClusterWritable What I do in details : 1) mahout seqdirectory -c UTF-8 -i inputdir -o sequencefiles 2) mahout seq2sparse -i sequencefiles -o sparsevectors -ow -a org.apache.lucene.analysis.WhitespaceAnalyzer -x 99 -wt tfidf -s 5 -md 1 -x 90 -ng 2 -ml 50 -seq -n 2 3) mahout rowid -i sparsevectors/tf-vectors -o rowidresult 4) mahout mahout cvb -i rowresult/matrix -dict sparsevectors/dictionary.file-0 -o topics -dt documents -mt states -ow -k 10 5) mahout clusterdump -i topics -o clusters -of TEXT -n 10 -d marcelproust/dictionary.file-0 -dt sequencefile When I run command 5, I get the error above. Unfortunately, I could not find any working solution after searching the archives, so I though I'd ask the community ! Thanks a lot in advance. Jeremie
