Agreed.
After seq2sparse, you need to create a matrix. http://stackoverflow.com/questions/14757162/run-cvb-in-mahout-0-8 so, something like this. mahout rowid -i $work_dir/input_seqparse/tf-vectors -o $work_dir/matrix mahout cvb -i $work_dir/matrix/ -o $work_dir/lda_output -mt $work_dir/lda_output/models -dt $work_dir/lda_output/docTopics -k 3 -nt -maxIter 200 Unsolicited advice. There is no reason to trash your sequence files (rm -rf $work_dir/input_seqfiles) each time. Provide a model location, this allows the computation to pick up where it left off if something were to go awry. - Corey On Mon, Jul 8, 2013 at 10:43 PM, Gmail <[email protected]> wrote: > Hi > > I am trying to run the mahout cvb on hadoop cluster using some text files > as input . I am getting the following error : > > Exception in thread "main" java.lang.IllegalStateException: No part files > found in model path 'temp/topicModelState/model-1' > > My script for running mahout cvb looks like this : > > export work_dir=/home/mahout > > rm -rf $work_dir/input_seqfiles > > ./mahout seqdirectory --input $work_dir/lda_input --output > $work_dir/input_seqfiles -c UTF8 > > rm -rf $work_dir/input_seqparse > > ./mahout seq2sparse -i $work_dir/input_seqfiles -o > $work_dir/input_seqparse -wt tf > > ./mahout cvb -i $work_dir/input_seqparse -o $work_dir/lda_output -k 3 -nt > 10 --maxIter 200 > > > Is there something i am missing ? Any help or suggestion is greatly > appreciated . > > Thanks > >
