Hi, I'm new here so forgive my little experience with Mahout.

We're trying to use Mahout (on our hadoop cluster) for calculating topics on 
almost 14000 documents.

I've been following this wiki page (http://goo.gl/DcPVjB) but still getting 
errors.

Here's what I'm doing:

1) creating sequence file from text files (mahout seqdirectory -i 
jojoba/text-files -o jojoba/seqfiles)
2) creating vectors FROM sequence files (mahout seq2sparse -i jojoba/seqfiles 
-o jojoba/vectors -wt tf  -nv)
3) launching CVB like this:
mahout cvb -i jojoba/vectors/tf-vectors/ -dict jojoba/vectors/dictionary.file-0 
-o jojoba/to-output -dt jojoba/do-output -k 190 -nt 90000 -mt jojoba/mt 
--maxIter 2 -mipd 1 -a 0.01 -e 0.01 -seed 37 -block 1

and I get Exception in thread "main" java.lang.InterruptedException: Failed to 
complete iteration 1 stage 1

I later learned here 
(http://stackoverflow.com/questions/14757162/run-cvb-in-mahout-0-8/) that I 
should actually feed cvb a matrix and not the vectors (shouldn't it be clearly 
stated in the wiki?).
So then I run:
mahout rowid -i jojoba/vectors/tf-vectors/ -o jojoba/matrix

3bis) I rerun CVB giving jojoba/matrix as input and I now get
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
org.apache.mahout.math.VectorWritable

What am I missing?

Thanks a lot for your help

Reply via email to