> > > My point is exactly that this evaluation will lead to nonsense. The size > of > the extracted topics vector isn't even necessarily the same as the size of > the labels vector. There is also no guarantee that it would be in the same > order. > > If order is not important in the comparison. I'm proposing something simple metric that is NOT great from a theory perspective.
Intersection(Document.LabelsVector, Document.ExtractedTopicsVector).Count() > What you need is one extra step where you build a supervised classifier > using the extracted topics vector to predict the label. The accuracy of > this supervised classifier is a measure of how well the extracted topics > encodes the information in the labels. > Why not mix it in and perform transductive learning then? I did not intent to propose a theoretically sound way to test LDA as an extractor/labeler of human tags. The intent was simple suggestion towards doing a quick-n-dirty test to see what the overlap of LDA extracted topics and human tags on a well tagged document set. - Neal
