2012/9/8 Aliabbas Petiwala <[email protected]>: > Hi, > i am trying to cluster a list of text docs based on similarity by first > identifying the clusters using PCA and then proceeding with a kmeans using > the results of PCA as shown below. tHE PROBLEM is that the kmeans does > output the 3 clusters but the plot function fails to display the clustering > results. The plot only shows one dot on which all cluster centers > overlapping. What i want is a simple cluster diagram visualizing the > clusters representing each doc as a dot on the plot and each dot labelled > with the doc name or number. > > vectorizer = TfidfVectorizer(max_features=50, max_df=0.5, > > stop_words='english',charset_error='ignore') > proptable = vectorizer.fit_transform([eereader.raw(f) for f in > eereader.fileids()]) > X=proptable.todense() > > pca = PCA(n_components=2).fit(X) > X_pca = pca.transform(X) > print > pca.components_,'\nvar=',pca.explained_variance_,'\nratio=\n',pca.explained_variance_ratio_ > > kmeans = KMeans(3).fit(X_pca) > > print 'clusters:',kmeans.cluster_centers_ > > plot_2D(X_pca, [1,2,3], ['group1','group2','group4']) > > def plot_2D(data, target, target_names): > colors = cycle('rgbcmykw') > target_ids = range(len(target_names)) > pylab.figure() > for i, c, label in zip(target_ids, colors, target_names): > pylab.scatter(data[target == i, 0], data[target == i, 1], > c=c, label=label) > pylab.legend() > pylab.show()
You pass `target=[1, 2, 3]` instead of `target=kmeans.labels_` to your plot function as target should have the same shape[0] as data. Furthermore: target_ids = range(len(target_names)) is equivalent to: target_ids = [0, 1, 2] -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
