We don't currently have inference on unseen documents as part of the mahout shell script, but there is a method you can use with very little modification:
LDADriver.computeDocumentTopicProbabilities() It will take in a SequenceFile with any kind of keys, and values which are VectorWritable, as long as the data set used the same dictionary file as the original training corpus, this should work just fine. It will spit out a SequenceFile with the same keys as the input, and values being VectorWritables which give the p(topic|document) across all topics. On Fri, Jul 22, 2011 at 2:41 PM, Ted Dunning <[email protected]> wrote: > Not in the same form for LDA. > > You can definitely use LDA to build feature vectors and then classifier > using those features using OnlineLogisticRegression. > > On Fri, Jul 22, 2011 at 12:56 AM, jun li <[email protected]> wrote: > > > > > I found in lingpipe book, there is a ldaclassifer which just load > trained > > model and symbol table ( id mapping to word string) and classify new > > document? > > > > can lda in mahout providing the same function or command ? > > > > thanks. > > > > > > -- > > Li Jun > > >
