I used more datasets in a range from 2200 to 3500 distinct words in the tf for 
training the LDA. This data are preprocessed with lemmatizing before 
CountVectorizrt.
________________________________
Von: Joel Nothman [joel.noth...@gmail.com]
Gesendet: Dienstag, 26. Januar 2016 23:35
An: scikit-learn-general
Betreff: Re: [Scikit-learn-general] Latent Dirichlet Allocation

How many distinct words are in your dataset?

On 27 January 2016 at 00:21, Rockenkamm, Christian 
<c.rockenk...@stud.uni-goettingen.de<mailto:c.rockenk...@stud.uni-goettingen.de>>
 wrote:
Hallo,

I have question concerning the Latent Dirichlet Allocation. The results I get 
from using it are a bit confusing.
At first I use about 3000 documents. In the preparation with the 
CountVectorizrt I use the following parameters : max_df=0.95 and min_df=0.05.
For the LDA fit I use the bath learning method. For the other parameters I have 
tried many different values. However regardless of which configuration I used, 
I face one common problem. I get topics that are never used in any of the docs 
and said topics all show the same structure (topic-word-distribution). I even 
tried gensim with the same configuration as scikit, yet I still encountered 
this problem. I also tried lowering the number of topics in the model, but this 
did not lead to the expected results either. For 100 topics, 20-27 were still 
affected by this problem, for 50 topics, there were still 2-8 of them being 
affected, depending on the parameter setting.
Does anybody have an idea as to what might be causing this problem and how to 
resolve it?

Best regards,
Christian Rockenkamm

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to