Folks,

  what changed in MiniBatchKMeans in .12 ?
Running it on datasets.load_digits() gave 10 classes in .11
but now only 8 in .12 ?
test-mbkmeans.py and logs attached.
(Sure the size is too small for MiniBatch
and for that matter kmeans is I think generally weak, low priority.)

Bytheway datasets.load_digits() is only the 1797 uciml/optdigits test,
not the 5620 train+test in mldata uci-20070111-optdigits .

cheers,
bon weekend
  -- denis


""" test MiniBatchKMeans on digits """
# http://scikit-learn.org/stable/auto_examples/document_clustering.html
# For large scale learning (say n_samples > 10k) MiniBatchKMeans is
# probably much faster to than the default batch implementation.

from __future__ import division
import sys
from time import time
import numpy as np

from sklearn import datasets, metrics, __version__
from sklearn.cluster import MiniBatchKMeans
    # $sklearnsrc11/cluster/k_means_.py

# from bz.etc import centref, confus

__date__ = "2012-09-07 Sep denis"

#..............................................................................
ks = [10]  # [9,10,11]
source = "uciml/optdigits*"  # 3823 train 1797 test
# source = "newsgroups"  # density 1 %
sparse = False
nnewscat = 5  # read 5 newsgroups: 2467 train + 1642 test
nnewsfeat = 10000
centre = 4
    #  MiniBatchKMeans --
batchsize = 100  # default 100
maxiter = 20  # default 20
tol = 0
init = "k-means++"  # "random"
ninit = 3

seed = 0
exec( "\n".join( sys.argv[1:] ))  # run this.py N= ...
np.random.seed(seed)
np.set_printoptions( 1, threshold=100, edgeitems=10, suppress=True )
if sparse:
    init = "random"  # ValueError: Init method 'k-means++' only for dense X.

#..............................................................................
bag = datasets.load_digits()  #  (1797, 64) uciml test only
X, y = bag.data, bag.target
if centre:
    norms = np.sqrt( np.sum( X**2, axis=1 ))  # sparse: TypeError
    X /= norms[:,np.newaxis]

print "\n", 80 * "-"
print "sklearn version", __version__
print "%s  %s  ks %s  init %s  ninit %d  batchsize %d  maxiter %d  tol %.2g  centre %d  sparse %s " % (
    source, X.shape, ks, init, ninit, batchsize, maxiter, tol, centre, sparse )

def clustersizes( labels ):
    return np.sort( np.bincount( labels ))[::-1]

#..............................................................................
def mbkmeans( X, labels, k ):
    mbkm = MiniBatchKMeans( k=k, max_iter=maxiter, random_state=seed,
           batch_size=batchsize, tol=tol, verbose=1,
           init=init, n_init=ninit )
    print mbkm
    print "cluster sizes true: %s" % clustersizes( labels )
    t0 = time()
    mbkm.fit(X)
    print "MiniBatchKMeans took %0.1fs" % (time() - t0)
    print "cluster sizes MiniBatchKmeans: %s" % clustersizes( mbkm.labels_ )

#     fancy --
#     print "Homogeneity: %0.3f" % metrics.homogeneity_score(labels, mbkm.labels_)
#     print "Completeness: %0.3f" % metrics.completeness_score(labels, mbkm.labels_)
#     print "V-measure: %0.3f" % metrics.v_measure_score(labels, mbkm.labels_)
#     print "Adjusted Rand-Index: %.3f" % \
#         metrics.adjusted_rand_score(labels, mbkm.labels_)
    return mbkm


#..............................................................................
for k in ks:
    km = mbkmeans( X, y, k=k )
        # $sklearn/metrics/cluster/supervised.py  contingency_matrix
        # confusmat = confus.pconfus( y, km.labels_ )
        # print "confus sum col max / total: %.0f %%" % (
        #     confusmat.max(axis=1).sum() / confusmat.sum() * 100)
    centres = km.cluster_centers_  # ncluster x dim
    savetxt = "mbkmeans-%s-centres.nptxt" % __version__[2:]
    print "np.savetxt", savetxt
    np.savetxt( savetxt, centres, fmt="%.3g" )

# from: test-mbkmeans.py
# run: 7 Sep 2012 13:07  in ~bz/py/ml/sklearn/minibatchkmeans    mac 10.4.11 
ppc 


--------------------------------------------------------------------------------
sklearn version .11
uciml/optdigits*  (1797, 64)  ks [10]  init k-means++  ninit 3  batchsize 100  
maxiter 20  tol 0  centre 4  sparse False 
MiniBatchKMeans(batch_size=100, chunk_size=None, compute_labels=True,
        init=k-means++, init_size=None, k=10, max_iter=20,
        max_no_improvement=10, n_init=3, random_state=0, tol=0, verbose=1)
cluster sizes true: [183 182 182 181 181 180 179 178 177 174]
Init 1/3 with method: k-means++
Inertia for init 1/3: 56.070999
Init 2/3 with method: k-means++
Inertia for init 2/3: 54.475154
Init 3/3 with method: k-means++
Inertia for init 3/3: 54.961484
Minibatch iteration 1/360:mean batch inertia: .202572, ewa inertia: .202572 
Minibatch iteration 2/360:mean batch inertia: .175550, ewa inertia: .199566 
Minibatch iteration 3/360:mean batch inertia: .171360, ewa inertia: .196429 
Minibatch iteration 4/360:mean batch inertia: .184734, ewa inertia: .195128 
Minibatch iteration 5/360:mean batch inertia: .187627, ewa inertia: .194293 
Minibatch iteration 6/360:mean batch inertia: .171662, ewa inertia: .191776 
Minibatch iteration 7/360:mean batch inertia: .183932, ewa inertia: .190904 
Minibatch iteration 8/360:mean batch inertia: .174830, ewa inertia: .189116 
Minibatch iteration 9/360:mean batch inertia: .170346, ewa inertia: .187028 
Minibatch iteration 10/360:mean batch inertia: .176690, ewa inertia: .185878 
Minibatch iteration 11/360:mean batch inertia: .169825, ewa inertia: .184092 
Minibatch iteration 12/360:mean batch inertia: .180244, ewa inertia: .183664 
Minibatch iteration 13/360:mean batch inertia: .187240, ewa inertia: .184062 
Minibatch iteration 14/360:mean batch inertia: .172794, ewa inertia: .182809 
Minibatch iteration 15/360:mean batch inertia: .176271, ewa inertia: .182081 
Minibatch iteration 16/360:mean batch inertia: .187222, ewa inertia: .182653 
Minibatch iteration 17/360:mean batch inertia: .186044, ewa inertia: .183030 
Minibatch iteration 18/360:mean batch inertia: .162337, ewa inertia: .180728 
Minibatch iteration 19/360:mean batch inertia: .162589, ewa inertia: .178711 
Minibatch iteration 20/360:mean batch inertia: .169023, ewa inertia: .177633 
Minibatch iteration 21/360:mean batch inertia: .172802, ewa inertia: .177096 
Minibatch iteration 22/360:mean batch inertia: .182926, ewa inertia: .177744 
Minibatch iteration 23/360:mean batch inertia: .178225, ewa inertia: .177798 
Minibatch iteration 24/360:mean batch inertia: .196936, ewa inertia: .179927 
Minibatch iteration 25/360:mean batch inertia: .171855, ewa inertia: .179029 
Minibatch iteration 26/360:mean batch inertia: .183285, ewa inertia: .179502 
Minibatch iteration 27/360:mean batch inertia: .157269, ewa inertia: .177029 
Minibatch iteration 28/360:mean batch inertia: .168802, ewa inertia: .176114 
Minibatch iteration 29/360:mean batch inertia: .189933, ewa inertia: .177651 
Minibatch iteration 30/360:mean batch inertia: .172926, ewa inertia: .177125 
Minibatch iteration 31/360:mean batch inertia: .174159, ewa inertia: .176795 
Minibatch iteration 32/360:mean batch inertia: .170243, ewa inertia: .176067 
Minibatch iteration 33/360:mean batch inertia: .163577, ewa inertia: .174677 
Minibatch iteration 34/360:mean batch inertia: .177372, ewa inertia: .174977 
Minibatch iteration 35/360:mean batch inertia: .169673, ewa inertia: .174387 
Minibatch iteration 36/360:mean batch inertia: .171826, ewa inertia: .174102 
Minibatch iteration 37/360:mean batch inertia: .170923, ewa inertia: .173749 
Minibatch iteration 38/360:mean batch inertia: .153546, ewa inertia: .171501 
Minibatch iteration 39/360:mean batch inertia: .163088, ewa inertia: .170566 
Minibatch iteration 40/360:mean batch inertia: .156397, ewa inertia: .168989 
Minibatch iteration 41/360:mean batch inertia: .168863, ewa inertia: .168975 
Minibatch iteration 42/360:mean batch inertia: .159743, ewa inertia: .167948 
Minibatch iteration 43/360:mean batch inertia: .180480, ewa inertia: .169342 
Minibatch iteration 44/360:mean batch inertia: .160384, ewa inertia: .168346 
Minibatch iteration 45/360:mean batch inertia: .174137, ewa inertia: .168990 
Minibatch iteration 46/360:mean batch inertia: .172547, ewa inertia: .169386 
Minibatch iteration 47/360:mean batch inertia: .162742, ewa inertia: .168647 
Minibatch iteration 48/360:mean batch inertia: .166977, ewa inertia: .168461 
Minibatch iteration 49/360:mean batch inertia: .176553, ewa inertia: .169361 
Minibatch iteration 50/360:mean batch inertia: .170433, ewa inertia: .169480 
Minibatch iteration 51/360:mean batch inertia: .167745, ewa inertia: .169287 
Minibatch iteration 52/360:mean batch inertia: .153796, ewa inertia: .167564 
Minibatch iteration 53/360:mean batch inertia: .152007, ewa inertia: .165834 
Minibatch iteration 54/360:mean batch inertia: .168747, ewa inertia: .166158 
Minibatch iteration 55/360:mean batch inertia: .159201, ewa inertia: .165384 
Minibatch iteration 56/360:mean batch inertia: .173691, ewa inertia: .166308 
Minibatch iteration 57/360:mean batch inertia: .175981, ewa inertia: .167384 
Minibatch iteration 58/360:mean batch inertia: .153377, ewa inertia: .165826 
Minibatch iteration 59/360:mean batch inertia: .164798, ewa inertia: .165712 
Minibatch iteration 60/360:mean batch inertia: .158669, ewa inertia: .164928 
Minibatch iteration 61/360:mean batch inertia: .163401, ewa inertia: .164758 
Minibatch iteration 62/360:mean batch inertia: .161074, ewa inertia: .164348 
Minibatch iteration 63/360:mean batch inertia: .172609, ewa inertia: .165267 
Minibatch iteration 64/360:mean batch inertia: .173305, ewa inertia: .166161 
Minibatch iteration 65/360:mean batch inertia: .168491, ewa inertia: .166421 
Minibatch iteration 66/360:mean batch inertia: .177667, ewa inertia: .167672 
Minibatch iteration 67/360:mean batch inertia: .174047, ewa inertia: .168381 
Minibatch iteration 68/360:mean batch inertia: .161913, ewa inertia: .167661 
Minibatch iteration 69/360:mean batch inertia: .177375, ewa inertia: .168742 
Minibatch iteration 70/360:mean batch inertia: .163209, ewa inertia: .168126 
Minibatch iteration 71/360:mean batch inertia: .159076, ewa inertia: .167120 
Minibatch iteration 72/360:mean batch inertia: .180215, ewa inertia: .168576 
Converged (lack of improvement in inertia) at iteration 72/360
Computing label assignements and total inertia
MiniBatchKMeans took 1.9s
cluster sizes MiniBatchKmeans: [275 195 181 178 173 173 172 166 145 139]

Confusion matrix: 10.5 % correct = 189 / 1797
True classes down, estimated across  / true class sizes
0:         2                           176            /  178  0 %
1:    2             23    2    1            103   51  /  182  0 %
2:    7         1  144              3    1    6   15  /  177  1 %
3:   10       103    3         2    4             61  /  183  2 %
4:    4  169                        5         3       /  181  0 %
5:         2   37         2  138                   3  /  182  76 %
6:    1                 176              1    3       /  181  0 %
7:   18                           149        12       /  179  0 %
8:  121         5    2    1    1    2        12   30  /  174  7 %
9:    3       129              3   10             35  /  180  19 %
  --------------------------------------------------
    166  173  275  172  181  145  173  178  139  195  estimates in each class
     93   95  155   94  100   80   96   99   80  108  est / true %

confus sum col max / total: 78 %
# from: test-mbkmeans.py
# run: 7 Sep 2012 12:59  in ~bz/py/ml/sklearn/minibatchkmeans    mac 10.4.11 
ppc 


--------------------------------------------------------------------------------
sklearn version .12
uciml/optdigits*  (1797, 64)  ks [10]  init k-means++  ninit 3  batchsize 100  
maxiter 20  tol 0  centre 4  sparse False 
MiniBatchKMeans(batch_size=100, compute_labels=True, init=k-means++,
        init_size=None, k=10, max_iter=20, max_no_improvement=10,
        n_clusters=8, n_init=3, random_state=0, tol=0, verbose=1)
cluster sizes true: [183 182 182 181 181 180 179 178 177 174]
Init 1/3 with method: k-means++
Inertia for init 1/3: 62.299326
Init 2/3 with method: k-means++
Inertia for init 2/3: 55.023277
Init 3/3 with method: k-means++
Inertia for init 3/3: 58.116186
Minibatch iteration 1/360:mean batch inertia: .185486, ewa inertia: .185486 
Minibatch iteration 2/360:mean batch inertia: .196757, ewa inertia: .186740 
Minibatch iteration 3/360:mean batch inertia: .182554, ewa inertia: .186274 
Minibatch iteration 4/360:mean batch inertia: .184407, ewa inertia: .186066 
Minibatch iteration 5/360:mean batch inertia: .181310, ewa inertia: .185537 
Minibatch iteration 6/360:mean batch inertia: .178348, ewa inertia: .184737 
Minibatch iteration 7/360:mean batch inertia: .188762, ewa inertia: .185185 
Minibatch iteration 8/360:mean batch inertia: .190513, ewa inertia: .185778 
Minibatch iteration 9/360:mean batch inertia: .183353, ewa inertia: .185508 
Minibatch iteration 10/360:mean batch inertia: .169423, ewa inertia: .183719 
Minibatch iteration 11/360:mean batch inertia: .185351, ewa inertia: .183900 
Minibatch iteration 12/360:mean batch inertia: .185132, ewa inertia: .184037 
Minibatch iteration 13/360:mean batch inertia: .190164, ewa inertia: .184719 
Minibatch iteration 14/360:mean batch inertia: .189076, ewa inertia: .185204 
Minibatch iteration 15/360:mean batch inertia: .179912, ewa inertia: .184615 
Minibatch iteration 16/360:mean batch inertia: .192346, ewa inertia: .185475 
Minibatch iteration 17/360:mean batch inertia: .186748, ewa inertia: .185617 
Minibatch iteration 18/360:mean batch inertia: .194699, ewa inertia: .186627 
Minibatch iteration 19/360:mean batch inertia: .161045, ewa inertia: .183781 
Minibatch iteration 20/360:mean batch inertia: .182165, ewa inertia: .183601 
Minibatch iteration 21/360:mean batch inertia: .181074, ewa inertia: .183320 
Minibatch iteration 22/360:mean batch inertia: .197469, ewa inertia: .184894 
Minibatch iteration 23/360:mean batch inertia: .175425, ewa inertia: .183841 
Minibatch iteration 24/360:mean batch inertia: .208165, ewa inertia: .186546 
Minibatch iteration 25/360:mean batch inertia: .201142, ewa inertia: .188170 
Minibatch iteration 26/360:mean batch inertia: .174979, ewa inertia: .186703 
Minibatch iteration 27/360:mean batch inertia: .185294, ewa inertia: .186546 
Minibatch iteration 28/360:mean batch inertia: .171763, ewa inertia: .184902 
Minibatch iteration 29/360:mean batch inertia: .177808, ewa inertia: .184113 
Minibatch iteration 30/360:mean batch inertia: .185854, ewa inertia: .184306 
Minibatch iteration 31/360:mean batch inertia: .185587, ewa inertia: .184449 
Converged (lack of improvement in inertia) at iteration 31/360
Computing label assignements and total inertia
MiniBatchKMeans took 1.1s
cluster sizes MiniBatchKmeans: [423 263 201 189 181 181 180 179]
warning: pconfus true max 9 != est max 7

Confusion matrix: 7.6 % correct = 136 / 1797
True classes down, estimated across  / true class sizes
0:            176              2                      /  178  0 %
1:       133         3   22         1   23            /  182  73 %
2:    2   11    1        12            151            /  177  1 %
3:   10    7            160         3    3            /  183  0 %
4:    8    4                 168    1                 /  181  0 %
5:                   1   27    1  153                 /  182  1 %
6:         4    1  176                                /  181  0 %
7:  171    1                   1    5    1            /  179  1 %
8:    2  100    1    1   50         9   11            /  174  0 %
9:    8    3            152    8    9                 /  180  0 %
  --------------------------------------------------
    201  263  179  181  423  180  181  189    0    0  estimates in each class
    113  145  101   99  234   99  100  106    0    0  est / true %

confus sum col max / total: 86 %
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to