The top few coefficients are in lda.topToWor.txt The rest of it is probably in lda.top
On Wed, Jul 20, 2011 at 12:52 PM, Ian Upright <[email protected]>wrote: > Hi, > > This is a little off topic, but perhaps someone on this list may be able to > comment. > > I'm still fairly new to LDA, and I've been playing with Yahoo's LDA > implementation. > > The Yahoo code produces a file called: > > lda.worToTop.txt > > www.teddybears.com/ recreation/toys (teddy,15) (bears,15) (enjoy,2) > (teddy,15) (bears,15) (enjoy,2) (featuring,41) (teddy,15) > www.bearsbythesea.com/ recreation/toys (teddy,99) (bear,99) (store,81) > (pismo,30) (beach,88) (california,24) (specialize,99) (muffy,99) (store,11) > (complete,11) (collections,46) (checkout,84) (web,87) > > So this shows that teddy is in topic 15 adn in topic 99. > > However, what I thought I would be looking for, is a vector, whereby each > word is defined as a set of probabilities into a particular topic. (eg, > with 600 topics I could have a vector that maps that word into each of > those > 600 topics) > > This vector could then be used for calculating similarity against other > words, etc. Is the correct idea? > > If so, using the Yahoo LDA output, for each unique word, I have to > calculate > that vector and probability myself, using the above file? Perhaps I'm > missing something? > > Thanks, Ian > -- Yee Yang Li Hector http://hectorgon.blogspot.com/ (tech + travel) http://hectorgon.com (book reviews)
