I think that this may also have to do with whether k-means retains a sense
of weight for the old clusters.  I don't think it currently does.

On Thu, May 12, 2011 at 10:09 AM, David Saile <[email protected]> wrote:

> I had that same thought, so I actually tried running k-Means twice on the
> Reuters dataset (as described in Quickstart).
> The second run received the resulting cluster of the first run as input.
>
> However, the execution times of the two runs did not differ much (actually
> the 2nd run was a bit slower).
> I also tried to double the input or the number of iterations, but no
> improvement.
>
> Could this be caused by running Hadoop on a single machine?
> Or is the number of iterations with 20 (or 40) simply not high enough?
>
> David
>
>
> Am 12.05.2011 um 18:46 schrieb Jeff Eastman:
>
> > Also, if cluster training begins with the posterior from a previous
> training session over the corpus but with new data added since that training
> began, the prior clusters should be very close to an optimal solution with
> the new data and the number of iterations required to converge on a new
> posterior should be reduced. Haven't tried this in practice but it seems
> logical. Convergence is calculated by how much each cluster has changed
> during an iteration.
> >
> > -----Original Message-----
> > From: Benson Margulies [mailto:[email protected]]
> > Sent: Thursday, May 12, 2011 9:14 AM
> > To: [email protected]
> > Subject: Re: AW: Incremental clustering
> >
> > Is the idea here that you are going to be presented with many
> > different corpora that have some sort of overall resemblance, so that
> > priors derived from the first N speed up clustering N+1?
> >
> > --benson
> >
>
>

Reply via email to