Hello,
We are currently trying to evaluate mahout-0.6's LDA implementation for
a couple of our use-cases.
One of those is assignment of topic probabilities to new documents -
this is, not contained in the training-corpus.
After a little bit of research we found that the
LDADriver.computeDocumentTopicProbabilities
method might be a good starting point. This method is private though.
Another problem would be creation
of a vector from a new document using the same dictionary. It seems that
SparseVectorsFromSequenceFiles
only supports "collective" creation of vectors.
Is there maybe already something implemented, I might have overseen, to
accomplish one or both steps?
I would be thankful for any suggestions and hints before I start
implementing something myself.
Thanks,
Dimitri
--
Neofonie GmbH
Robert-Koch-Platz 4
10115 Berlin
T +49.30 24627-241
F +49.30 24627 120
[email protected]
http://www.neofonie.de
Handelsregister
Berlin-Charlottenburg: HRB 67460
Geschäftsführung
Thomas Kitlitschko