2013/8/21 Norman Rosner <[email protected]>: > Hi readers of the sklearn mailing list, > > I'm a noob in terms of a lot of stuff that sklearn handles, machine > learning and thelike. > As it happens I'm currently writing my thesis with the help of > sklearn, specifically I'm using NMF. I want to compare NMF to kmeans > based clustering. As far as I get it NMF produces two matrices (term X > topic/component and topic/component X document) and NMF.components_ > returns the former of the matrices. Is there any way I could get the > latter matrix since I want to have a label assigned to each document? > > Any help is very appreciated!
For positive text features, just computing the not product with the component will give you a positive value akin to cosine similarity that should grow . Hence if you can take the top components based on that similarity score to get cluster assignments. Futhermore, NMF is good at being able to find components that act as a soft clustering of the document: each document will typically be assigned to a few topics / clusters rather than just one as with hard clustering algorithms like kmeans. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
