Dirichlet maintains weights (total counts over all iterations) in the mixture but Kmeans does not have anything equivalent.
-----Original Message----- From: Ted Dunning [mailto:[email protected]] Sent: Thursday, May 12, 2011 10:16 AM To: [email protected] Subject: Re: AW: Incremental clustering I think that this may also have to do with whether k-means retains a sense of weight for the old clusters. I don't think it currently does. On Thu, May 12, 2011 at 10:09 AM, David Saile <[email protected]> wrote: > I had that same thought, so I actually tried running k-Means twice on the > Reuters dataset (as described in Quickstart). > The second run received the resulting cluster of the first run as input. > > However, the execution times of the two runs did not differ much (actually > the 2nd run was a bit slower). > I also tried to double the input or the number of iterations, but no > improvement. > > Could this be caused by running Hadoop on a single machine? > Or is the number of iterations with 20 (or 40) simply not high enough? > > David > > > Am 12.05.2011 um 18:46 schrieb Jeff Eastman: > > > Also, if cluster training begins with the posterior from a previous > training session over the corpus but with new data added since that training > began, the prior clusters should be very close to an optimal solution with > the new data and the number of iterations required to converge on a new > posterior should be reduced. Haven't tried this in practice but it seems > logical. Convergence is calculated by how much each cluster has changed > during an iteration. > > > > -----Original Message----- > > From: Benson Margulies [mailto:[email protected]] > > Sent: Thursday, May 12, 2011 9:14 AM > > To: [email protected] > > Subject: Re: AW: Incremental clustering > > > > Is the idea here that you are going to be presented with many > > different corpora that have some sort of overall resemblance, so that > > priors derived from the first N speed up clustering N+1? > > > > --benson > > > >
