I have done that before ,new documrnt query .the problem i meet is that some work like caculate the token count and tfidf depend on mapreduce method.i dont't think it's fit for a new doc.You need to use the dictionary to check if there is a new word. and then caculate the word vector and topic vector
在 2012年2月29日 下午7:23,Dimitri Goldin <[email protected]> 写道: > Hello, > > We are currently trying to evaluate mahout-0.6's LDA implementation for a > couple of our use-cases. > One of those is assignment of topic probabilities to new documents - this > is, not contained in the training-corpus. > After a little bit of research we found that the > LDADriver.computeDocumentTopicProbabilities > method might be a good starting point. This method is private though. > Another problem would be creation > of a vector from a new document using the same dictionary. It seems that > SparseVectorsFromSequenceFiles > only supports "collective" creation of vectors. > > Is there maybe already something implemented, I might have overseen, to > accomplish one or both steps? > I would be thankful for any suggestions and hints before I start > implementing something myself. > > Thanks, > Dimitri > > -- > Neofonie GmbH > Robert-Koch-Platz 4 > 10115 Berlin > T +49.30 24627-241 > F +49.30 24627 120 > [email protected] > http://www.neofonie.de > > Handelsregister > Berlin-Charlottenburg: HRB 67460 > > Geschäftsführung > Thomas Kitlitschko
