Re: [scikit-learn] LatentDirichletAllocation failing to find topics in NLTK Gutenberg corpus?

2017-09-20 Thread Markus Konrad
I tried it with 12 topics (that's the number that minimized the log likelihood) and there were also some very general topics. But the Gibbs sampling didn't extract "empty topics" (those with all weights equal to `topic_word_prior`) as opposed to sklearn's implementation. This is what puzzled me. I

Re: [scikit-learn] LatentDirichletAllocation failing to find topics in NLTK Gutenberg corpus?

2017-09-20 Thread Markus Konrad
Sorry, I meant of course "the number that *maximized* the log likelihood" in the first sentence... On 09/20/2017 09:18 AM, Markus Konrad wrote: > I tried it with 12 topics (that's the number that minimized the log > likelihood) and there were also some very general topics. But the Gibbs > samplin

Re: [scikit-learn] Accessing Clustering Feature Tree in Birch

2017-09-20 Thread Sema Atasever
I need this information to use it in a scientific study and I think that a function interface would make this easier. Thank you for your answer. On Sat, Sep 16, 2017 at 1:53 PM, Joel Nothman wrote: > There is no such thing as "the data samples in this cluster". The point of > Birch being online