I have been running it with no problems for some time. Previously I posted the code from a script I use to run it (shown at bottom of this link): http://mail-archives.apache.org/mod_mbox/mahout-user/201206.mbox/%3CCACYXym_LJJgWMBQ4TEgPJjx2PUvjicA1kZ1Kmo_xQ=wkjnu...@mail.gmail.com%3E CVB call: $MAHOUT cvb -i ${WORK_DIR}/sparse-vectors-cvb -o ${WORK_DIR}/reuters-cvb -k 150 -ow -x 10 -dict ${WORK_DIR}/reuters-out-seqdir-sparse-cvb/dictionary.file-0 -mt ${WORK_DIR}/topic-model-cvb -dt ${WORK_DIR}/doc-topic-cvb So, I guess one thing to check is that your input vector folder actually contains files, and in the correct format (i.e., keys need to be Integers) which is why I use rowid to format my prior sparse vectors. And make sure any other inputs being passed also exist in the correct format (e.g., -dict). Also in my case I also specify -mt. I seem to recall having issues when not doing so. Prior to a run I delete the -mt file too as I had trouble if a prior run generated an error. Not sure what your "-a" parameter does. Dan
________________________________ From: seth <[email protected]> To: [email protected] Sent: Thursday, July 12, 2012 7:18 PM Subject: help with cvb I'm trying to run the cvb lda algorithm like so: $MAHOUT_HOME/mahout cvb -i ./mahout_data/vectors/vectors/vectors\ for\ cvb/ -ow -x 20 -o ./mahout_data/clusters/ -k 140 -dt dist.txt -dict ./mahout_data/vectors/vectors/dictionary.file-0 -a 3 but I get this error 12/07/12 16:06:06 INFO cvb.CVB0Driver: Will run Collapsed Variational Bayes (0th-derivative approximation) learning for LDA on mahout_data/vectors/vectors/vectors for cvb (numTerms: 20165), finding 140-topics, with document/topic prior 3.0, topic/term prior 1.0E-4. Maximum iterations to run will be 20, unless the change in perplexity is less than 0.0. Topic model output (p(term|topic) for each topic) will be stored mahout_data/clusters. Random initialization seed is 9411, holding out 0.0 of the data for perplexity check 12/07/12 16:06:06 INFO cvb.CVB0Driver: Dictionary to be used located mahout_data/vectors/vectors/dictionary.file-0 p(topic|docId) will be stored dist.txt 12/07/12 16:06:06 INFO cvb.CVB0Driver: Found previous state: temp/topicModelState/model-1 12/07/12 16:06:06 INFO cvb.CVB0Driver: Current iteration number: 1 12/07/12 16:06:06 WARN cvb.CVB0Driver: Perplexity path temp/topicModelState/perplexity-1 does not exist, returning NaN 12/07/12 16:06:06 INFO cvb.CVB0Driver: About to run iteration 2 of 20 12/07/12 16:06:06 INFO cvb.CVB0Driver: About to run: Iteration 2 of 20, input path: temp/topicModelState/model-1 Exception in thread "main" java.lang.IllegalStateException: No part files found in model path 'temp/topicModelState/model-1' at com.google.common.base.Preconditions.checkState(Preconditions.java:172) at org.apache.mahout.clustering.lda.cvb.CVB0Driver.setModelPaths(CVB0Driver.java:529) at org.apache.mahout.clustering.lda.cvb.CVB0Driver.runIteration(CVB0Driver.java:515) at org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:304) at org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:187) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.clustering.lda.cvb.CVB0Driver.main(CVB0Driver.java:550) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) Can anyone help me understand what I'm missing? Thanks, Seth -- View this message in context: http://lucene.472066.n3.nabble.com/help-with-cvb-tp3994763.html Sent from the Mahout User List mailing list archive at Nabble.com.
