Re: [Scikit-learn-general] NMF for document clustering - how to label the documents

Olivier Grisel Mon, 26 Aug 2013 01:36:11 -0700

2013/8/21 Norman Rosner <[email protected]>:
> Hi readers of the sklearn mailing list,
>
> I'm a noob in terms of a lot of stuff that sklearn handles, machine
> learning and thelike.
> As it happens I'm currently writing my thesis with the help of
> sklearn, specifically I'm using NMF. I want to compare NMF to kmeans
> based clustering. As far as I get it NMF produces two matrices (term X
> topic/component and topic/component X document) and NMF.components_
> returns the former of the matrices. Is there any way I could get the
> latter matrix since I want to have a label assigned to each document?
>
> Any help is very appreciated!



For positive text features, just computing the not product with the
component will give you a positive value akin to cosine similarity
that should grow . Hence if you can take the top components based on
that similarity score to get cluster assignments.

Futhermore, NMF is good at being able to find components that act as a
soft clustering of the document: each document will typically be
assigned to a few topics / clusters rather than just one as with hard
clustering algorithms like kmeans.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] NMF for document clustering - how to label the documents

Reply via email to