Hi everyone,

I have tried several of the clustering algorithms in mahout and they worked
great, but I have a problem with the cvd implementation of Latent Dirichlet
Allocation. The cvb command works fine but then using clusterdump gives me
the following error :

Exception in thread "main" java.lang.ClassCastException:
org.apache.mahout.math.VectorWritable cannot be cast to
org.apache.mahout.clustering.iterator.ClusterWritable

What I do in details :
1) mahout seqdirectory -c UTF-8 -i inputdir -o sequencefiles
2) mahout seq2sparse -i sequencefiles -o sparsevectors -ow -a
org.apache.lucene.analysis.WhitespaceAnalyzer -x 99 -wt tfidf -s 5 -md 1 -x
90 -ng 2 -ml 50 -seq -n 2
3) mahout rowid -i sparsevectors/tf-vectors -o rowidresult
4) mahout mahout cvb -i rowresult/matrix -dict
sparsevectors/dictionary.file-0 -o topics -dt documents -mt states -ow -k 10
5) mahout clusterdump -i topics -o clusters -of TEXT -n 10 -d
marcelproust/dictionary.file-0 -dt sequencefile

When I run command 5, I get the error above. Unfortunately, I could not
find any working solution after searching the archives, so I though I'd ask
the community !

Thanks a lot in advance.
Jeremie

Reply via email to