What exactly is this probability distribution? For each word in your vocabulary it is the probability that a randomly drawn word from a topic is that word. Another way to visualise it is a 2-column vector where the 1st column is a word in your vocabulary and the 2nd column is the probability of that word appearing. All the values in the 2nd-column must be >= 0 and if you add up all the values they should sum to 1. That is the definition of a probability distribution.
Clearly for the idea of topics to be at all useful you want different topics to exhibit different probability distributions i.e. some words to be more likely in 1 topic compared to another topic. How does it actually infer words and topics? Probably a good idea to google for that one if you really want to understand the details - there are some great resources available. How can I connect the output to the actual words in each topic? A typical way is to look at the top 5, 10 or 20 words in each topic and use those to infer something about what the topic represents. ------------------------------------------------------------------------------- Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action <http://www.manning.com/books/spark-graphx-in-action> > On 3 Dec 2015, at 05:07, Nguyen, Tiffany T <nguye...@grinnell.edu> wrote: > > Hello, > > I have been trying to understand the LDA topic modeling example provided > here: > https://spark.apache.org/docs/latest/mllib-clustering.html#latent-dirichlet-allocation-lda > > <https://spark.apache.org/docs/latest/mllib-clustering.html#latent-dirichlet-allocation-lda>. > In the example, they load word count vectors from a text file that contains > these word counts and then they output the topics, which is represented as > probability distributions over words. What exactly is this probability > distribution? How does it actually infer words and topics and how can I > connect the output to the actual words in each topic? > > Thanks!