On Sat, Sep 22, 2012 at 12:49 PM, chyi-kwei yau <[email protected]>wrote:
> Hi, > You should be able to run inference on a test data set. > And use perplexity of the test set to measure the performance of your > model. > > Check the LDA paper here and see the detail: > http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf The current LDA implementation in Mahout has a command-line option: --test_set_percentage to hold out some of your training data as a "test set" which is used to measure held-out perplexity during training. The command-line option: --iteration_block_size sets the training to compute held-out perplexity after this many iterations (so if you set this to 10 then held-out perplexity is only computed ever 10 iterations over the input data). The perplexity is logged to the console during training, and is also persisted in sequence files parallel with the model files (in a directory like $OUTPUT_DIR/perplexity-$ITERATION_NUMBER or something like that). So this will tell you how well converged you are, and how likely your test data would be to have been generated by your model, if that is a test you'd find useful. > > > Best, > Chyi-Kwei > > On Sat, Sep 22, 2012 at 2:51 PM, Jake Mannix <[email protected]> > wrote: > > What would you want a test to tell you? LDA is unsupervised, so it'll > give > > you the word-topic probabilities, and for each test document (or training > > document) you can get the document-topic probabilities as well. Then... > > what would you like to know at that point? > > > > On Sat, Sep 22, 2012 at 10:00 AM, vineeth <[email protected]> > wrote: > > > >> Hello, > >> > >> I am searching for how to run mahout LDA on test data set to detect the > >> topics. Is there a way to test the trained lda model? or should we write > >> our own program based on the word-topic probabilities that the LDA spits > >> out after running on the test data? > >> > >> Thanks > >> Vineeth > >> > > > > > > > > -- > > > > -jake > -- -jake
