I suggest you try running with a trunk checkout and upgrading to Hadoop 0.20.2. Mahout is still in motion and I've run LDA on Reuters on trunk in the last few days. The maxIter parameter should not be an issue; you could try removing it entirely and LDA will default to running to convergence (about 100 iterations which can take some time). I've found the Reuters results don't change too much after 20. Even with a clean trunk checkout Reuters will only use a single node and the iterations should take about 5 mins each. If you want to run on a multi-node cluster, install the patch in MAHOUT-397 (

https://issues.apache.org/jira/browse/MAHOUT-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel)
 and use the same arguments as in examples/bin/build-reuters.sh. Even on a 
3-node cluster this brings the iteration time down to about a minute and a half 
which is worth doing.

Hope this helps,
Jeff

http://www.windwardsolutions.com



On 5/22/10 5:40 AM, 杨杰 wrote:
Hi, everyone

I'm trying mahout now. When running LDA on reuter corpus
(http://lucene.grantingersoll.com/2010/02/16/trijug-intro-to-mahout-slides-and-demo-examples/),
A parameter refuses to work. This parameter is "maxIter", without
which, i cannot decide the iteration to run~

My CMD is:
bin/mahout.hadoop lda --input mahout/seq-sparse-tf/vectors --output
mahout/seq-sparse-tf/lda-out5 --numWords 34000 --numTopics 20
--maxIter 1

But got a exception:
10/05/22 20:32:11 ERROR lda.LDADriver: Exception
org.apache.commons.cli2.OptionException: Unexpected 2 while processing Options
        at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:100)
        at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:115)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:172)
...

What's the problem? I'm using version 0.3&  Hadoop 0.20.0.

Thank you!



Reply via email to