Re: LDA in Mahout

Neal Richter Thu, 06 Jan 2011 13:04:37 -0800

>
>
> My point is exactly that this evaluation will lead to nonsense.  The size
> of
> the extracted topics vector isn't even necessarily the same as the size of
> the labels vector.  There is also no guarantee that it would be in the same
> order.
>
>
If order is not important in the comparison.  I'm proposing something simple
metric that is NOT great from a theory perspective.


Intersection(Document.LabelsVector, Document.ExtractedTopicsVector).Count()



> What you need is one extra step where you build a supervised classifier
> using the extracted topics vector to predict the label.  The accuracy of
> this supervised classifier is a measure of how well the extracted topics
> encodes the information in the labels.
>

Why not mix it in and perform transductive learning then?

I did not intent to propose a theoretically sound way to test LDA as an
extractor/labeler of human tags.  The intent was simple suggestion towards
doing a quick-n-dirty test to see what the overlap of LDA extracted topics
and human tags on a well tagged document set.

- Neal

Re: LDA in Mahout

Reply via email to