Changing it in the LDAState works fine (at least it runs Reuters with
--numWords=1) but the numWords is also used to initialize the state data
in LDADriver.writeInitialState():
double total = 0.0; // total number of pseudo counts we made
for (int w = 0; w < numWords; ++w) {
IntPairWritable kw = new IntPairWritable(k, w);
// A small amount of random noise, minimized by having a floor.
double pseudocount = random.nextDouble() + 1.0E-8;
total += pseudocount;
v.set(Math.log(pseudocount));
writer.append(kw, v);
}
I don't want to use Integer.MAX_VALUE here :)
On 5/23/10 2:14 PM, Jeff Eastman wrote:
Yes it is a DenseMatrix. Providing a value that is too large just
wastes some space. I'll try the random access approach and see what
happens...
On 5/23/10 2:09 PM, Ted Dunning wrote:
What happens if the number is too large? Is this a dense matrix we are
talking about?
Would it work to make it a random access sparse matrix with very,
very large
bounds?
On Sun, May 23, 2010 at 10:29 AM, Jeff Eastman
<[email protected]>wrote:
I agree it is not very friendly. Impossible to tell the correct
value in
the options section processing. It needs to be>= than the actual
number of
unique terms in the corpus and that is hard to anticipate though I
think it
is known in seq2sparse. If it turns out to be the dictionary size (I'm
investigating), then it could be computed by adding a dictionary path
argument instead of the current option. Trouble with that is the
dictionary
is not needed for anything else by LDA.
On 5/23/10 9:38 AM, Sean Owen wrote:
Is there a way to catch that with a more descriptive error earlier? I
always
think AIOOBE looks bad.
On May 23, 2010 4:11 PM, "Jeff Eastman"<[email protected]>
wrote:
Yes, your -numWords option is set too low and that's causing the array
exception. Try -v 50000.
On 5/23/10 3:20 AM, 杨杰 wrote:
Jeff and Robin,
Thank you for your suggestion! There is anot...