I suggest you try running with a trunk checkout and upgrading to Hadoop
0.20.2. Mahout is still in motion and I've run LDA on Reuters on trunk
in the last few days. The maxIter parameter should not be an issue; you
could try removing it entirely and LDA will default to running to
convergence (about 100 iterations which can take some time). I've found
the Reuters results don't change too much after 20. Even with a clean
trunk checkout Reuters will only use a single node and the iterations
should take about 5 mins each. If you want to run on a multi-node
cluster, install the patch in MAHOUT-397 (
https://issues.apache.org/jira/browse/MAHOUT-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel)
and use the same arguments as in examples/bin/build-reuters.sh. Even on a
3-node cluster this brings the iteration time down to about a minute and a half
which is worth doing.
Hope this helps,
Jeff
http://www.windwardsolutions.com
On 5/22/10 5:40 AM, 杨杰 wrote:
Hi, everyone
I'm trying mahout now. When running LDA on reuter corpus
(http://lucene.grantingersoll.com/2010/02/16/trijug-intro-to-mahout-slides-and-demo-examples/),
A parameter refuses to work. This parameter is "maxIter", without
which, i cannot decide the iteration to run~
My CMD is:
bin/mahout.hadoop lda --input mahout/seq-sparse-tf/vectors --output
mahout/seq-sparse-tf/lda-out5 --numWords 34000 --numTopics 20
--maxIter 1
But got a exception:
10/05/22 20:32:11 ERROR lda.LDADriver: Exception
org.apache.commons.cli2.OptionException: Unexpected 2 while processing Options
at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:100)
at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:115)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:172)
...
What's the problem? I'm using version 0.3& Hadoop 0.20.0.
Thank you!