Re: [scikit-learn] LatentDirichletAllocation failing to find topics in NLTK Gutenberg corpus?

2017-09-19 Thread Markus Konrad
This is indeed interesting. I didn't know that there are so big differences between these approaches. I split the 18 documents into sub-documents of 5 paragraphs each, so that I got around 10k of these sub-documents. Now, scikit-learn and gensim deliver much better results, quite similar to those f

Re: [scikit-learn] LatentDirichletAllocation failing to find topics in NLTK Gutenberg corpus?

2017-09-19 Thread Andreas Mueller
I'm actually surprised the gibbs sampling gave useful results with so little data. And splitting the documents results in very different data. It has a lot more information. How many topics did you use? Also: PR for docs welcome! On 09/19/2017 04:26 AM, Markus Konrad wrote: This is indeed int