Folks,
what changed in MiniBatchKMeans in .12 ?
Running it on datasets.load_digits() gave 10 classes in .11
but now only 8 in .12 ?
test-mbkmeans.py and logs attached.
(Sure the size is too small for MiniBatch
and for that matter kmeans is I think generally weak, low priority.)
Bytheway datasets.load_digits() is only the 1797 uciml/optdigits test,
not the 5620 train+test in mldata uci-20070111-optdigits .
cheers,
bon weekend
-- denis
""" test MiniBatchKMeans on digits """
# http://scikit-learn.org/stable/auto_examples/document_clustering.html
# For large scale learning (say n_samples > 10k) MiniBatchKMeans is
# probably much faster to than the default batch implementation.
from __future__ import division
import sys
from time import time
import numpy as np
from sklearn import datasets, metrics, __version__
from sklearn.cluster import MiniBatchKMeans
# $sklearnsrc11/cluster/k_means_.py
# from bz.etc import centref, confus
__date__ = "2012-09-07 Sep denis"
#..............................................................................
ks = [10] # [9,10,11]
source = "uciml/optdigits*" # 3823 train 1797 test
# source = "newsgroups" # density 1 %
sparse = False
nnewscat = 5 # read 5 newsgroups: 2467 train + 1642 test
nnewsfeat = 10000
centre = 4
# MiniBatchKMeans --
batchsize = 100 # default 100
maxiter = 20 # default 20
tol = 0
init = "k-means++" # "random"
ninit = 3
seed = 0
exec( "\n".join( sys.argv[1:] )) # run this.py N= ...
np.random.seed(seed)
np.set_printoptions( 1, threshold=100, edgeitems=10, suppress=True )
if sparse:
init = "random" # ValueError: Init method 'k-means++' only for dense X.
#..............................................................................
bag = datasets.load_digits() # (1797, 64) uciml test only
X, y = bag.data, bag.target
if centre:
norms = np.sqrt( np.sum( X**2, axis=1 )) # sparse: TypeError
X /= norms[:,np.newaxis]
print "\n", 80 * "-"
print "sklearn version", __version__
print "%s %s ks %s init %s ninit %d batchsize %d maxiter %d tol %.2g centre %d sparse %s " % (
source, X.shape, ks, init, ninit, batchsize, maxiter, tol, centre, sparse )
def clustersizes( labels ):
return np.sort( np.bincount( labels ))[::-1]
#..............................................................................
def mbkmeans( X, labels, k ):
mbkm = MiniBatchKMeans( k=k, max_iter=maxiter, random_state=seed,
batch_size=batchsize, tol=tol, verbose=1,
init=init, n_init=ninit )
print mbkm
print "cluster sizes true: %s" % clustersizes( labels )
t0 = time()
mbkm.fit(X)
print "MiniBatchKMeans took %0.1fs" % (time() - t0)
print "cluster sizes MiniBatchKmeans: %s" % clustersizes( mbkm.labels_ )
# fancy --
# print "Homogeneity: %0.3f" % metrics.homogeneity_score(labels, mbkm.labels_)
# print "Completeness: %0.3f" % metrics.completeness_score(labels, mbkm.labels_)
# print "V-measure: %0.3f" % metrics.v_measure_score(labels, mbkm.labels_)
# print "Adjusted Rand-Index: %.3f" % \
# metrics.adjusted_rand_score(labels, mbkm.labels_)
return mbkm
#..............................................................................
for k in ks:
km = mbkmeans( X, y, k=k )
# $sklearn/metrics/cluster/supervised.py contingency_matrix
# confusmat = confus.pconfus( y, km.labels_ )
# print "confus sum col max / total: %.0f %%" % (
# confusmat.max(axis=1).sum() / confusmat.sum() * 100)
centres = km.cluster_centers_ # ncluster x dim
savetxt = "mbkmeans-%s-centres.nptxt" % __version__[2:]
print "np.savetxt", savetxt
np.savetxt( savetxt, centres, fmt="%.3g" )
# from: test-mbkmeans.py
# run: 7 Sep 2012 13:07 in ~bz/py/ml/sklearn/minibatchkmeans mac 10.4.11
ppc
--------------------------------------------------------------------------------
sklearn version .11
uciml/optdigits* (1797, 64) ks [10] init k-means++ ninit 3 batchsize 100
maxiter 20 tol 0 centre 4 sparse False
MiniBatchKMeans(batch_size=100, chunk_size=None, compute_labels=True,
init=k-means++, init_size=None, k=10, max_iter=20,
max_no_improvement=10, n_init=3, random_state=0, tol=0, verbose=1)
cluster sizes true: [183 182 182 181 181 180 179 178 177 174]
Init 1/3 with method: k-means++
Inertia for init 1/3: 56.070999
Init 2/3 with method: k-means++
Inertia for init 2/3: 54.475154
Init 3/3 with method: k-means++
Inertia for init 3/3: 54.961484
Minibatch iteration 1/360:mean batch inertia: .202572, ewa inertia: .202572
Minibatch iteration 2/360:mean batch inertia: .175550, ewa inertia: .199566
Minibatch iteration 3/360:mean batch inertia: .171360, ewa inertia: .196429
Minibatch iteration 4/360:mean batch inertia: .184734, ewa inertia: .195128
Minibatch iteration 5/360:mean batch inertia: .187627, ewa inertia: .194293
Minibatch iteration 6/360:mean batch inertia: .171662, ewa inertia: .191776
Minibatch iteration 7/360:mean batch inertia: .183932, ewa inertia: .190904
Minibatch iteration 8/360:mean batch inertia: .174830, ewa inertia: .189116
Minibatch iteration 9/360:mean batch inertia: .170346, ewa inertia: .187028
Minibatch iteration 10/360:mean batch inertia: .176690, ewa inertia: .185878
Minibatch iteration 11/360:mean batch inertia: .169825, ewa inertia: .184092
Minibatch iteration 12/360:mean batch inertia: .180244, ewa inertia: .183664
Minibatch iteration 13/360:mean batch inertia: .187240, ewa inertia: .184062
Minibatch iteration 14/360:mean batch inertia: .172794, ewa inertia: .182809
Minibatch iteration 15/360:mean batch inertia: .176271, ewa inertia: .182081
Minibatch iteration 16/360:mean batch inertia: .187222, ewa inertia: .182653
Minibatch iteration 17/360:mean batch inertia: .186044, ewa inertia: .183030
Minibatch iteration 18/360:mean batch inertia: .162337, ewa inertia: .180728
Minibatch iteration 19/360:mean batch inertia: .162589, ewa inertia: .178711
Minibatch iteration 20/360:mean batch inertia: .169023, ewa inertia: .177633
Minibatch iteration 21/360:mean batch inertia: .172802, ewa inertia: .177096
Minibatch iteration 22/360:mean batch inertia: .182926, ewa inertia: .177744
Minibatch iteration 23/360:mean batch inertia: .178225, ewa inertia: .177798
Minibatch iteration 24/360:mean batch inertia: .196936, ewa inertia: .179927
Minibatch iteration 25/360:mean batch inertia: .171855, ewa inertia: .179029
Minibatch iteration 26/360:mean batch inertia: .183285, ewa inertia: .179502
Minibatch iteration 27/360:mean batch inertia: .157269, ewa inertia: .177029
Minibatch iteration 28/360:mean batch inertia: .168802, ewa inertia: .176114
Minibatch iteration 29/360:mean batch inertia: .189933, ewa inertia: .177651
Minibatch iteration 30/360:mean batch inertia: .172926, ewa inertia: .177125
Minibatch iteration 31/360:mean batch inertia: .174159, ewa inertia: .176795
Minibatch iteration 32/360:mean batch inertia: .170243, ewa inertia: .176067
Minibatch iteration 33/360:mean batch inertia: .163577, ewa inertia: .174677
Minibatch iteration 34/360:mean batch inertia: .177372, ewa inertia: .174977
Minibatch iteration 35/360:mean batch inertia: .169673, ewa inertia: .174387
Minibatch iteration 36/360:mean batch inertia: .171826, ewa inertia: .174102
Minibatch iteration 37/360:mean batch inertia: .170923, ewa inertia: .173749
Minibatch iteration 38/360:mean batch inertia: .153546, ewa inertia: .171501
Minibatch iteration 39/360:mean batch inertia: .163088, ewa inertia: .170566
Minibatch iteration 40/360:mean batch inertia: .156397, ewa inertia: .168989
Minibatch iteration 41/360:mean batch inertia: .168863, ewa inertia: .168975
Minibatch iteration 42/360:mean batch inertia: .159743, ewa inertia: .167948
Minibatch iteration 43/360:mean batch inertia: .180480, ewa inertia: .169342
Minibatch iteration 44/360:mean batch inertia: .160384, ewa inertia: .168346
Minibatch iteration 45/360:mean batch inertia: .174137, ewa inertia: .168990
Minibatch iteration 46/360:mean batch inertia: .172547, ewa inertia: .169386
Minibatch iteration 47/360:mean batch inertia: .162742, ewa inertia: .168647
Minibatch iteration 48/360:mean batch inertia: .166977, ewa inertia: .168461
Minibatch iteration 49/360:mean batch inertia: .176553, ewa inertia: .169361
Minibatch iteration 50/360:mean batch inertia: .170433, ewa inertia: .169480
Minibatch iteration 51/360:mean batch inertia: .167745, ewa inertia: .169287
Minibatch iteration 52/360:mean batch inertia: .153796, ewa inertia: .167564
Minibatch iteration 53/360:mean batch inertia: .152007, ewa inertia: .165834
Minibatch iteration 54/360:mean batch inertia: .168747, ewa inertia: .166158
Minibatch iteration 55/360:mean batch inertia: .159201, ewa inertia: .165384
Minibatch iteration 56/360:mean batch inertia: .173691, ewa inertia: .166308
Minibatch iteration 57/360:mean batch inertia: .175981, ewa inertia: .167384
Minibatch iteration 58/360:mean batch inertia: .153377, ewa inertia: .165826
Minibatch iteration 59/360:mean batch inertia: .164798, ewa inertia: .165712
Minibatch iteration 60/360:mean batch inertia: .158669, ewa inertia: .164928
Minibatch iteration 61/360:mean batch inertia: .163401, ewa inertia: .164758
Minibatch iteration 62/360:mean batch inertia: .161074, ewa inertia: .164348
Minibatch iteration 63/360:mean batch inertia: .172609, ewa inertia: .165267
Minibatch iteration 64/360:mean batch inertia: .173305, ewa inertia: .166161
Minibatch iteration 65/360:mean batch inertia: .168491, ewa inertia: .166421
Minibatch iteration 66/360:mean batch inertia: .177667, ewa inertia: .167672
Minibatch iteration 67/360:mean batch inertia: .174047, ewa inertia: .168381
Minibatch iteration 68/360:mean batch inertia: .161913, ewa inertia: .167661
Minibatch iteration 69/360:mean batch inertia: .177375, ewa inertia: .168742
Minibatch iteration 70/360:mean batch inertia: .163209, ewa inertia: .168126
Minibatch iteration 71/360:mean batch inertia: .159076, ewa inertia: .167120
Minibatch iteration 72/360:mean batch inertia: .180215, ewa inertia: .168576
Converged (lack of improvement in inertia) at iteration 72/360
Computing label assignements and total inertia
MiniBatchKMeans took 1.9s
cluster sizes MiniBatchKmeans: [275 195 181 178 173 173 172 166 145 139]
Confusion matrix: 10.5 % correct = 189 / 1797
True classes down, estimated across / true class sizes
0: 2 176 / 178 0 %
1: 2 23 2 1 103 51 / 182 0 %
2: 7 1 144 3 1 6 15 / 177 1 %
3: 10 103 3 2 4 61 / 183 2 %
4: 4 169 5 3 / 181 0 %
5: 2 37 2 138 3 / 182 76 %
6: 1 176 1 3 / 181 0 %
7: 18 149 12 / 179 0 %
8: 121 5 2 1 1 2 12 30 / 174 7 %
9: 3 129 3 10 35 / 180 19 %
--------------------------------------------------
166 173 275 172 181 145 173 178 139 195 estimates in each class
93 95 155 94 100 80 96 99 80 108 est / true %
confus sum col max / total: 78 %
# from: test-mbkmeans.py
# run: 7 Sep 2012 12:59 in ~bz/py/ml/sklearn/minibatchkmeans mac 10.4.11
ppc
--------------------------------------------------------------------------------
sklearn version .12
uciml/optdigits* (1797, 64) ks [10] init k-means++ ninit 3 batchsize 100
maxiter 20 tol 0 centre 4 sparse False
MiniBatchKMeans(batch_size=100, compute_labels=True, init=k-means++,
init_size=None, k=10, max_iter=20, max_no_improvement=10,
n_clusters=8, n_init=3, random_state=0, tol=0, verbose=1)
cluster sizes true: [183 182 182 181 181 180 179 178 177 174]
Init 1/3 with method: k-means++
Inertia for init 1/3: 62.299326
Init 2/3 with method: k-means++
Inertia for init 2/3: 55.023277
Init 3/3 with method: k-means++
Inertia for init 3/3: 58.116186
Minibatch iteration 1/360:mean batch inertia: .185486, ewa inertia: .185486
Minibatch iteration 2/360:mean batch inertia: .196757, ewa inertia: .186740
Minibatch iteration 3/360:mean batch inertia: .182554, ewa inertia: .186274
Minibatch iteration 4/360:mean batch inertia: .184407, ewa inertia: .186066
Minibatch iteration 5/360:mean batch inertia: .181310, ewa inertia: .185537
Minibatch iteration 6/360:mean batch inertia: .178348, ewa inertia: .184737
Minibatch iteration 7/360:mean batch inertia: .188762, ewa inertia: .185185
Minibatch iteration 8/360:mean batch inertia: .190513, ewa inertia: .185778
Minibatch iteration 9/360:mean batch inertia: .183353, ewa inertia: .185508
Minibatch iteration 10/360:mean batch inertia: .169423, ewa inertia: .183719
Minibatch iteration 11/360:mean batch inertia: .185351, ewa inertia: .183900
Minibatch iteration 12/360:mean batch inertia: .185132, ewa inertia: .184037
Minibatch iteration 13/360:mean batch inertia: .190164, ewa inertia: .184719
Minibatch iteration 14/360:mean batch inertia: .189076, ewa inertia: .185204
Minibatch iteration 15/360:mean batch inertia: .179912, ewa inertia: .184615
Minibatch iteration 16/360:mean batch inertia: .192346, ewa inertia: .185475
Minibatch iteration 17/360:mean batch inertia: .186748, ewa inertia: .185617
Minibatch iteration 18/360:mean batch inertia: .194699, ewa inertia: .186627
Minibatch iteration 19/360:mean batch inertia: .161045, ewa inertia: .183781
Minibatch iteration 20/360:mean batch inertia: .182165, ewa inertia: .183601
Minibatch iteration 21/360:mean batch inertia: .181074, ewa inertia: .183320
Minibatch iteration 22/360:mean batch inertia: .197469, ewa inertia: .184894
Minibatch iteration 23/360:mean batch inertia: .175425, ewa inertia: .183841
Minibatch iteration 24/360:mean batch inertia: .208165, ewa inertia: .186546
Minibatch iteration 25/360:mean batch inertia: .201142, ewa inertia: .188170
Minibatch iteration 26/360:mean batch inertia: .174979, ewa inertia: .186703
Minibatch iteration 27/360:mean batch inertia: .185294, ewa inertia: .186546
Minibatch iteration 28/360:mean batch inertia: .171763, ewa inertia: .184902
Minibatch iteration 29/360:mean batch inertia: .177808, ewa inertia: .184113
Minibatch iteration 30/360:mean batch inertia: .185854, ewa inertia: .184306
Minibatch iteration 31/360:mean batch inertia: .185587, ewa inertia: .184449
Converged (lack of improvement in inertia) at iteration 31/360
Computing label assignements and total inertia
MiniBatchKMeans took 1.1s
cluster sizes MiniBatchKmeans: [423 263 201 189 181 181 180 179]
warning: pconfus true max 9 != est max 7
Confusion matrix: 7.6 % correct = 136 / 1797
True classes down, estimated across / true class sizes
0: 176 2 / 178 0 %
1: 133 3 22 1 23 / 182 73 %
2: 2 11 1 12 151 / 177 1 %
3: 10 7 160 3 3 / 183 0 %
4: 8 4 168 1 / 181 0 %
5: 1 27 1 153 / 182 1 %
6: 4 1 176 / 181 0 %
7: 171 1 1 5 1 / 179 1 %
8: 2 100 1 1 50 9 11 / 174 0 %
9: 8 3 152 8 9 / 180 0 %
--------------------------------------------------
201 263 179 181 423 180 181 189 0 0 estimates in each class
113 145 101 99 234 99 100 106 0 0 est / true %
confus sum col max / total: 86 %
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general