Re: Mahout LDA Parameter: maxIter

Jeff Eastman Sun, 23 May 2010 14:38:38 -0700

Changing it in the LDAState works fine (at least it runs Reuters with--numWords=1) but the numWords is also used to initialize the state datain LDADriver.writeInitialState():


      double total = 0.0; // total number of pseudo counts we made
      for (int w = 0; w < numWords; ++w) {
        IntPairWritable kw = new IntPairWritable(k, w);
        // A small amount of random noise, minimized by having a floor.
        double pseudocount = random.nextDouble() + 1.0E-8;
        total += pseudocount;
        v.set(Math.log(pseudocount));
        writer.append(kw, v);
      }


I don't want to use Integer.MAX_VALUE here :)

On 5/23/10 2:14 PM, Jeff Eastman wrote:

Yes it is a DenseMatrix. Providing a value that is too large justwastes some space. I'll try the random access approach and see whathappens...
On 5/23/10 2:09 PM, Ted Dunning wrote:
What happens if the number is too large?  Is this a dense matrix we are
talking about?
Would it work to make it a random access sparse matrix with very,very large
bounds?

On Sun, May 23, 2010 at 10:29 AM, Jeff Eastman
<[email protected]>wrote:
I agree it is not very friendly. Impossible to tell the correctvalue inthe options section processing. It needs to be>= than the actualnumber ofunique terms in the corpus and that is hard to anticipate though Ithink it
is known in seq2sparse. If it turns out to be the dictionary size (I'm
investigating), then it could be computed by adding a dictionary path
argument instead of the current option. Trouble with that is thedictionary
is not needed for anything else by LDA.

On 5/23/10 9:38 AM, Sean Owen wrote:
Is there a way to catch that with a more descriptive error earlier? I
always
think AIOOBE looks bad.

On May 23, 2010 4:11 PM, "Jeff Eastman"<[email protected]>
  wrote:

Yes, your -numWords option is set too low and that's causing the array
exception. Try -v 50000.



On 5/23/10 3:20 AM, 杨杰 wrote:
Jeff and Robin,

Thank you for your suggestion! There is anot...

Re: Mahout LDA Parameter: maxIter

Reply via email to