Re: Mahout LDA Parameter: maxIter

Robin Anil Sun, 23 May 2010 09:42:31 -0700

Its caused by not setting the correct word count I believe. Use the same
value as the dictionary count. It has to be fixed one of these days.



Robin
On Sun, May 23, 2010 at 3:50 PM, 杨杰 <[email protected]> wrote:

> Jeff and Robin,
>
> Thank you for your suggestion! There is another problem: Compiled the
> source from trunk and applied the patch MAHOUT-397, I retried the lda
> experiment, but another exception was thrown:
>
> 10/05/23 17:01:52 INFO common.HadoopUtil: Deleting
> mahout/seq-sparse-tf/lda-out
> 10/05/23 17:01:55 INFO lda.LDADriver: Iteration 1
> 10/05/23 17:01:55 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the
> same.
> 10/05/23 17:01:56 INFO input.FileInputFormat: Total input paths to process
> : 1
> 10/05/23 17:01:56 INFO mapred.JobClient: Running job: job_201005231654_0001
> 10/05/23 17:01:57 INFO mapred.JobClient:  map 0% reduce 0%
> 10/05/23 17:02:10 INFO mapred.JobClient: Task Id :
> attempt_201005231654_0001_m_000000_0, Status : FAILED
> java.lang.ArrayIndexOutOfBoundsException: 123
>        at
> org.apache.mahout.clustering.lda.LDAInference.infer(LDAInference.java:106)
>        at org.apache.mahout.clustering.lda.LDAMapper.map(LDAMapper.java:45)
>        at org.apache.mahout.clustering.lda.LDAMapper.map(LDAMapper.java:36)
>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:518)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> The COMMAND is the same as the former one except "-ow" which is "-w"
> in 0.3 distribution; dataset is also the same with mahout 0.3 (on
> which the experiment works ok except for *only one map* in each
> iteration~).
>
> Is it because of absence of some other patches? Or is there any other
> mistakes  in my operations?
>
> Thank you!
>
>
> On Sun, May 23, 2010 at 8:01 AM, Robin Anil <[email protected]> wrote:
> > David's rule of thumb was to let the iterations go until relative change
> in
> > LL becomes around 10^-4
> >
> > Robin
> >
> > On Sat, May 22, 2010 at 9:12 PM, Jeff Eastman <
> [email protected]>wrote:
> >
> >> I suggest you try running with a trunk checkout and upgrading to Hadoop
> >> 0.20.2. Mahout is still in motion and I've run LDA on Reuters on trunk
> in
> >> the last few days. The maxIter parameter should not be an issue; you
> could
> >> try removing it entirely and LDA will default to running to convergence
> >> (about 100 iterations which can take some time). I've found the Reuters
> >> results don't change too much after 20. Even with a clean trunk checkout
> >> Reuters will only use a single node and the iterations should take about
> 5
> >> mins each. If you want to run on a multi-node cluster, install the patch
> in
> >> MAHOUT-397 (
> >>
> >>
> >>
> https://issues.apache.org/jira/browse/MAHOUT-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> )
> >> and use the same arguments as in examples/bin/build-reuters.sh. Even on
> a
> >> 3-node cluster this brings the iteration time down to about a minute and
> a
> >> half which is worth doing.
> >>
> >> Hope this helps,
> >> Jeff
> >>
> >> http://www.windwardsolutions.com
> >>
> >>
> >>
> >>
> >> On 5/22/10 5:40 AM, 杨杰 wrote:
> >>
> >>> Hi, everyone
> >>>
> >>> I'm trying mahout now. When running LDA on reuter corpus
> >>> (
> >>>
> http://lucene.grantingersoll.com/2010/02/16/trijug-intro-to-mahout-slides-and-demo-examples/
> >>> ),
> >>> A parameter refuses to work. This parameter is "maxIter", without
> >>> which, i cannot decide the iteration to run~
> >>>
> >>> My CMD is:
> >>> bin/mahout.hadoop lda --input mahout/seq-sparse-tf/vectors --output
> >>> mahout/seq-sparse-tf/lda-out5 --numWords 34000 --numTopics 20
> >>> --maxIter 1
> >>>
> >>> But got a exception:
> >>> 10/05/22 20:32:11 ERROR lda.LDADriver: Exception
> >>> org.apache.commons.cli2.OptionException: Unexpected 2 while processing
> >>> Options
> >>>        at
> >>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:100)
> >>>        at
> >>> org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:115)
> >>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>>        at
> >>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>>        at
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>>        at java.lang.reflect.Method.invoke(Method.java:597)
> >>>        at
> >>>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >>>        at
> >>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >>>        at
> >>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:172)
> >>> ...
> >>>
> >>> What's the problem? I'm using version 0.3&  Hadoop 0.20.0.
> >>>
> >>> Thank you!
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >
>
>
>
> --
> Yang Jie（杨杰）
> hi.baidu.com/thinkdifferent
>
> Group of CLOUD, Xi'an Jiaotong University
> Department of Computer Science and Technology, Xi’an Jiaotong University
>
> PHONE: 86 1346888 3723
> TEL: 86 29 82665263 EXT. 608
> MSN: [email protected]
>
> once i didn't know software is not free, but found it days later; now
> i realize that it's indeed free.
>

Re: Mahout LDA Parameter: maxIter

Reply via email to