Hello again, I've added some extra code to plot 6 smaller graphs, which should change the initialization of the centroids. The title of the plots should be the sum of the squared distance of each initialization (which should tell me when they converge).
The smaller graphs appear (unlabelled) however, my data is not being plotted. Can someone tell me where I'm going wrong? n_iter = 9 fig, ax = plt.subplots(3, 3, figsize=(16, 16)) ax = np.ravel(ax) centers = [] for i in range(n_iter): # Run local implementation of kmeans max_iter=3, random_state=np.random.randint(0, 1000, size=1) clusters = km.clusters ax[i].scatter([km.labels == 0, 0], [km.labels == 0, 1], c='green', label='cluster 1') ax[i].scatter([km.labels == 1, 0], [km.labels == 1, 1], c='blue', label='cluster 2') ax[i].scatter(clusters[:, 0], clusters[:, 1], c='r', marker='*', s=300, label='centroid') ax[i].set_xlim([-2, 2]) ax[i].set_ylim([-2, 2]) ax[i].legend(loc='lower right') ax[i].set_title(f'{km.error:.4f}') ax[i].set_aspect('equal') plt.tight_layout(); Thanks... ________________________________ From: Stephen Malcolm <stephen_malc...@hotmail.com> Sent: 04 October 2020 21:14 To: scikit-learn@python.org <scikit-learn@python.org> Subject: Rerunning Kmeans with Python Hello all, I've written some code to run Kmeans on a data set (please see below). And I've plotted the results, with my two clusters/ centroids. However, I've to re-run Kmeans several times and pull up different plots (showing the different centroid positions). Can someone point me in the right direction how to write this extra code to perform this task? Then I've to conclude if Kmeans is stable. I believe this is the lowest sum of squared errors? Thanking you in advance. #pandas used to read dataset and return the data #numpy and matplotlib to represent and visualize the data #sklearn to implement kmeans algorithm import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler #import the data data = pd.read_csv('file.csv') #extract values x=data['V1'] y=data['V2'] V1_V2 = np.column_stack ((V1, V2)) km_res = KMeans (n_clusters= 2).fit(V1_V2) y_kmeans = km_res.predict(V1_V2) plt.scatter(V1, V2, c=y_kmeans, cmap='viridis', s = 50, alpha = 0.5) plt.xlabel('V1') plt.ylabel('V2') plt.title('Visualization of raw data'); clusters = km_res.cluster_centers_ plt.scatter(clusters[:,0], clusters[:,1], c='blue', s=150) Get Outlook for iOS<https://aka.ms/o0ukef>
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn