We faced similar issue with Mahout 0.7, with Mahout 0.8 it is resolved... -----Original Message----- From: dilpreet singh [mailto:[email protected]] Sent: Wednesday, July 10, 2013 4:59 AM To: [email protected] Subject: Re: Error running mahout cvb
Thanks for the advice guys . That was hellpful . I modified the script to create a matrix . Now i am hitting this error : *Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0* I think something might be wrong with vector dump but not sure . My Script now looks like this : *export HADOOP_HOME=/usr/local/hadoop/hadoop-1.1.2 export HADOOP_CONF_DIR=$HADOOP_HOME/conf export JAVA_OPTS=="-Xmx12240m -Xms1024m -server" ./mahout seqdirectory --input /mahout/lda_input --output /mahout/input_seqfiles -c UTF8 ./mahout seq2sparse -i /mahout/input_seqfiles -o /mahout/input_seqparse -wt tf ./mahout rowid -i /mahout/input_seqparse/tf-vectors -o /mahout/matrix rm -rf /mahout/lda_output/final_output rm -rf /mahout/lda_output/docTopics ./mahout cvb -i /mahout/matrix/matrix -o /mahout/lda_output/final_output -mt /mahout/lda_output/models -dt /mahout/lda_output/docTopics -k 6 -nt 10 -x 4 -ow ./mahout vectordump -i /mahout/lda_output/final_output -d /mahout/input_seqparse/dictionary.file-0 -dt sequencefile --vectorSize 10 --printKey TRUE * I was expecting to see top 10 terms from each topic in the terminal . Any suggestions ? Dilpreet Singh On Tue, Jul 9, 2013 at 2:21 AM, Corey Hyllested <[email protected]>wrote: > Agreed. > > > After seq2sparse, you need to create a matrix. > > http://stackoverflow.com/questions/14757162/run-cvb-in-mahout-0-8 > > so, something like this. > > mahout rowid -i $work_dir/input_seqparse/tf-vectors -o $work_dir/matrix > > mahout cvb -i $work_dir/matrix/ -o $work_dir/lda_output -mt > $work_dir/lda_output/models -dt $work_dir/lda_output/docTopics -k 3 > -nt -maxIter 200 > > > Unsolicited advice. > > There is no reason to trash your sequence files (rm -rf > $work_dir/input_seqfiles) each time. > > Provide a model location, this allows the computation to pick up where > it left off if something were to go awry. > > > - Corey > > > On Mon, Jul 8, 2013 at 10:43 PM, Gmail <[email protected]> wrote: > > > Hi > > > > I am trying to run the mahout cvb on hadoop cluster using some text files > > as input . I am getting the following error : > > > > Exception in thread "main" java.lang.IllegalStateException: No part files > > found in model path 'temp/topicModelState/model-1' > > > > My script for running mahout cvb looks like this : > > > > export work_dir=/home/mahout > > > > rm -rf $work_dir/input_seqfiles > > > > ./mahout seqdirectory --input $work_dir/lda_input --output > > $work_dir/input_seqfiles -c UTF8 > > > > rm -rf $work_dir/input_seqparse > > > > ./mahout seq2sparse -i $work_dir/input_seqfiles -o > > $work_dir/input_seqparse -wt tf > > > > ./mahout cvb -i $work_dir/input_seqparse -o $work_dir/lda_output -k 3 -nt > > 10 --maxIter 200 > > > > > > Is there something i am missing ? Any help or suggestion is greatly > > appreciated . > > > > Thanks > > > > >
