Hi, A quick way I use is to draw a convex hull (scipy) around the points in a cluster. Here's a short example - k-means with k=2 is run on synthetic data:
from sklearn.datasets import make_blobs from sklearn.cluster import KMeans from matplotlib import pyplot as plt from scipy.spatial import ConvexHull X, _ = make_blobs(centers=2) kmeans = KMeans(n_clusters=2, random_state=0).fit(X) # uncomment the next line if you're using a notebook #%matplotlib inline for label in set(kmeans.labels_): X_clust = X[kmeans.labels_==label] hull = ConvexHull(X_clust, qhull_options='QJ') vertices_cycle = hull.vertices.tolist() vertices_cycle.append(hull.vertices[0]) plt.plot(X_clust[vertices_cycle, 0], X_clust[vertices_cycle, 1], 'k--', lw=1) plt.scatter(X_clust[:, 0], X_clust[:, 1]) Note: 1. You can still have overlaps between boundaries - but I think this is a good effort-to-results tradeoff. 2. To draw a closed boundary, you'd need to add the first vertex to the list returned by the hull function - the above code does that. 3. You'd need to handle the case for clusters with <=2 points explicitly - not shown in the above code. 4. I use the "QJ" option (other options at the qhull library page, which scipy internally uses: http://www.qhull.org/html/qh-optq.htm) to joggle the points a bit when they lie on a line. Regards On Wed, Dec 9, 2020 at 12:41 PM Brown J.B. via scikit-learn < scikit-learn@python.org> wrote: > Dear Mahmood, > > Andrew's solution with a circle will guarantee you render an image in > which every point is covered within some circle. > > However, if data contains outliers or artifacts, you might get circles > which are excessively large and distort the image you want. > For example, imagine if there were a single red point in Andrew's image at > the coordinate (3,10); then, the resulting circle would cover all points in > the entire plot, which is unlikely what you want. > You could potentially generate a density estimate for each class and then > have matplotlib render the contour lines (e.g., solutions of where > estimates have a specific value), but as was said, this is not the job of > Kmeans, but rather of general data analysis. > > The ellipsoid solution proposed to you is, in a sense, a middle ground > between these two solutions (the circles and the density plots). > You could adjust the (4 or 5) parameters of an ellipsoid to cover "most" > of the points for a particular class and tolerate that the ellipsoids don't > cover a few outliers or artifacts (e.g., the coordinate (3,10) I mentioned > above). > The resulting functional forms of the ellipses might be more precise than > circles and less complex than density contours, and might lead to > actionable knowledge depending on your context/domain. > > Hope this helps. > J.B. Brown > > 2020年12月9日(水) 21:08 Mahmood Naderan <mahmood...@gmail.com>: > >> >Mebbe principal components analysis would suggest an >> >ellipsoid containing "most" of the points in a "cloud". >> >> Sorry I didn't understand. Can you explain more? >> Regards, >> Mahmood >> >> >> >> >> On Wed, Dec 9, 2020 at 8:55 PM The Helmbolds via scikit-learn < >> scikit-learn@python.org> wrote: >> >>> [scikit-learn] Drawing contours in KMeans4 >>> >>> >>> Mebbe principal components analysis would suggest an ellipsoid >>> containing "most" of the points in a "cloud". >>> >>> >>> >>> >>> "You won't find the right answers if you don't ask the right questions!" >>> (Robert Helmbold, 2013) >>> >>> >>> On Wednesday, December 9, 2020, 12:22:49 PM MST, Andrew Howe < >>> ahow...@gmail.com> wrote: >>> >>> >>> Ok, I see. Well the attached notebook demonstrates doing this by simply >>> finding the maximum distance from each centroid to it's datapoints and >>> drawing a circle using that radius. It's simple, but will hopefully at >>> least point you in a useful direction. >>> [image: image.png] >>> Andrew >>> >>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>> J. Andrew Howe, PhD >>> LinkedIn Profile <http://www.linkedin.com/in/ahowe42> >>> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/> >>> Open Researcher and Contributor ID (ORCID) >>> <http://orcid.org/0000-0002-3553-1990> >>> Github Profile <http://github.com/ahowe42> >>> Personal Website <http://www.andrewhowe.com> >>> I live to learn, so I can learn to live. - me >>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>> >>> >>> On Wed, Dec 9, 2020 at 12:59 PM Mahmood Naderan <mahmood...@gmail.com> >>> wrote: >>> >>> I mean a circle/contour to group the points in a cluster for better >>> representation. >>> For example, if there are 6 six clusters, it will be more meaningful to >>> group large data points in a circle or contour. >>> >>> Regards, >>> Mahmood >>> >>> >>> >>> >>> On Wed, Dec 9, 2020 at 11:49 AM Andrew Howe <ahow...@gmail.com> wrote: >>> >>> Contours generally indicate a third variable - often a probability >>> density. Kmeans doesn't provide density estimates, so what precisely would >>> you want the contours to represent? >>> >>> Andrew >>> >>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>> J. Andrew Howe, PhD >>> LinkedIn Profile <http://www.linkedin.com/in/ahowe42> >>> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/> >>> Open Researcher and Contributor ID (ORCID) >>> <http://orcid.org/0000-0002-3553-1990> >>> Github Profile <http://github.com/ahowe42> >>> Personal Website <http://www.andrewhowe.com> >>> I live to learn, so I can learn to live. - me >>> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>> >>> >>> On Wed, Dec 9, 2020 at 9:41 AM Mahmood Naderan <mahmood...@gmail.com> >>> wrote: >>> >>> Hi >>> I use the following code to highlight the cluster centers with some red >>> dots. >>> >>> kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=100, n_init=10, >>> random_state=0) >>> pred_y = kmeans.fit_predict(a) >>> plt.scatter(a[:,0], a[:,1]) >>> plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, >>> 1], s=100, c='red') >>> plt.show() >>> >>> I would like to know if it is possible to draw contours over the >>> clusters. Is there any way for that? >>> Please let me know if there is a function or option in KMeans. >>> >>> Regards, >>> Mahmood >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Computers: The eventual realization of Douglas Adams' musings - the world depends on machines controlled by mice.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn