Changing it in the LDAState works fine (at least it runs Reuters with --numWords=1) but the numWords is also used to initialize the state data in LDADriver.writeInitialState():

      double total = 0.0; // total number of pseudo counts we made
      for (int w = 0; w < numWords; ++w) {
        IntPairWritable kw = new IntPairWritable(k, w);
        // A small amount of random noise, minimized by having a floor.
        double pseudocount = random.nextDouble() + 1.0E-8;
        total += pseudocount;
        v.set(Math.log(pseudocount));
        writer.append(kw, v);
      }

I don't want to use Integer.MAX_VALUE here :)

On 5/23/10 2:14 PM, Jeff Eastman wrote:
Yes it is a DenseMatrix. Providing a value that is too large just wastes some space. I'll try the random access approach and see what happens...


On 5/23/10 2:09 PM, Ted Dunning wrote:
What happens if the number is too large?  Is this a dense matrix we are
talking about?

Would it work to make it a random access sparse matrix with very, very large
bounds?

On Sun, May 23, 2010 at 10:29 AM, Jeff Eastman
<[email protected]>wrote:

I agree it is not very friendly. Impossible to tell the correct value in the options section processing. It needs to be>= than the actual number of unique terms in the corpus and that is hard to anticipate though I think it
is known in seq2sparse. If it turns out to be the dictionary size (I'm
investigating), then it could be computed by adding a dictionary path
argument instead of the current option. Trouble with that is the dictionary
is not needed for anything else by LDA.

On 5/23/10 9:38 AM, Sean Owen wrote:

Is there a way to catch that with a more descriptive error earlier? I
always
think AIOOBE looks bad.

On May 23, 2010 4:11 PM, "Jeff Eastman"<[email protected]>
  wrote:

Yes, your -numWords option is set too low and that's causing the array
exception. Try -v 50000.



On 5/23/10 3:20 AM, 杨杰 wrote:


Jeff and Robin,

Thank you for your suggestion! There is anot...






Reply via email to