Re: Mahout LDA Parameter: maxIter

Jeff Eastman Sun, 23 May 2010 08:11:57 -0700

Yes, your -numWords option is set too low and that's causing the arrayexception. Try -v 50000.


On 5/23/10 3:20 AM, 杨杰 wrote:

Jeff and Robin,


Thank you for your suggestion! There is another problem: Compiled the
source from trunk and applied the patch MAHOUT-397, I retried the lda
experiment, but another exception was thrown:

10/05/23 17:01:52 INFO common.HadoopUtil: Deleting mahout/seq-sparse-tf/lda-out
10/05/23 17:01:55 INFO lda.LDADriver: Iteration 1
10/05/23 17:01:55 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
10/05/23 17:01:56 INFO input.FileInputFormat: Total input paths to process : 1
10/05/23 17:01:56 INFO mapred.JobClient: Running job: job_201005231654_0001
10/05/23 17:01:57 INFO mapred.JobClient:  map 0% reduce 0%
10/05/23 17:02:10 INFO mapred.JobClient: Task Id :
attempt_201005231654_0001_m_000000_0, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: 123
        at 
org.apache.mahout.clustering.lda.LDAInference.infer(LDAInference.java:106)
        at org.apache.mahout.clustering.lda.LDAMapper.map(LDAMapper.java:45)
        at org.apache.mahout.clustering.lda.LDAMapper.map(LDAMapper.java:36)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:518)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

The COMMAND is the same as the former one except "-ow" which is "-w"
in 0.3 distribution; dataset is also the same with mahout 0.3 (on
which the experiment works ok except for *only one map* in each
iteration~).

Is it because of absence of some other patches? Or is there any other
mistakes  in my operations?

Thank you!


On Sun, May 23, 2010 at 8:01 AM, Robin Anil<[email protected]>  wrote:

David's rule of thumb was to let the iterations go until relative change in
LL becomes around 10^-4

Robin

On Sat, May 22, 2010 at 9:12 PM, Jeff Eastman<[email protected]>wrote:

I suggest you try running with a trunk checkout and upgrading to Hadoop
0.20.2. Mahout is still in motion and I've run LDA on Reuters on trunk in
the last few days. The maxIter parameter should not be an issue; you could
try removing it entirely and LDA will default to running to convergence
(about 100 iterations which can take some time). I've found the Reuters
results don't change too much after 20. Even with a clean trunk checkout
Reuters will only use a single node and the iterations should take about 5
mins each. If you want to run on a multi-node cluster, install the patch in
MAHOUT-397 (


https://issues.apache.org/jira/browse/MAHOUT-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel)
and use the same arguments as in examples/bin/build-reuters.sh. Even on a
3-node cluster this brings the iteration time down to about a minute and a
half which is worth doing.

Hope this helps,
Jeff

http://www.windwardsolutions.com




On 5/22/10 5:40 AM, 杨杰 wrote:

Hi, everyone

I'm trying mahout now. When running LDA on reuter corpus
(
http://lucene.grantingersoll.com/2010/02/16/trijug-intro-to-mahout-slides-and-demo-examples/
),
A parameter refuses to work. This parameter is "maxIter", without
which, i cannot decide the iteration to run~

My CMD is:
bin/mahout.hadoop lda --input mahout/seq-sparse-tf/vectors --output
mahout/seq-sparse-tf/lda-out5 --numWords 34000 --numTopics 20
--maxIter 1

But got a exception:
10/05/22 20:32:11 ERROR lda.LDADriver: Exception
org.apache.commons.cli2.OptionException: Unexpected 2 while processing
Options
        at
org.apache.commons.cli2.commandline.Parser.parse(Parser.java:100)
        at
org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:115)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at
org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:172)
...

What's the problem? I'm using version 0.3&    Hadoop 0.20.0.

Thank you!

Re: Mahout LDA Parameter: maxIter

Reply via email to