Re: Inference of topic probabilities to new documents using LDA

chenghao liu Wed, 29 Feb 2012 07:43:07 -0800

I have done that before ,new documrnt query .the problem i meet is
that some work like caculate the token count and tfidf depend on
mapreduce method.i dont't think it's fit for a new doc.You need to use
the dictionary to check if there is a new word. and then caculate the
word vector and topic vector


在 2012年2月29日 下午7:23，Dimitri Goldin <[email protected]> 写道：
> Hello,
>
> We are currently trying to evaluate mahout-0.6's LDA implementation for a
> couple of our use-cases.
> One of those is assignment of topic probabilities to new documents - this
> is, not contained in the training-corpus.
> After a little bit of research we found that the
> LDADriver.computeDocumentTopicProbabilities
> method might be a good starting point. This method is private though.
> Another problem would be creation
> of a vector from a new document using the same dictionary. It seems that
> SparseVectorsFromSequenceFiles
> only supports "collective" creation of vectors.
>
> Is there maybe already something implemented, I might have overseen, to
> accomplish one or both steps?
> I would be thankful for any suggestions and hints before I start
> implementing something myself.
>
> Thanks,
>    Dimitri
>
> --
> Neofonie GmbH
> Robert-Koch-Platz 4
> 10115 Berlin
> T +49.30 24627-241
> F +49.30 24627 120
> [email protected]
> http://www.neofonie.de
>
> Handelsregister
> Berlin-Charlottenburg: HRB 67460
>
> Geschäftsführung
> Thomas Kitlitschko

Re: Inference of topic probabilities to new documents using LDA

Reply via email to