Also, if cluster training begins with the posterior from a previous training session over the corpus but with new data added since that training began, the prior clusters should be very close to an optimal solution with the new data and the number of iterations required to converge on a new posterior should be reduced. Haven't tried this in practice but it seems logical. Convergence is calculated by how much each cluster has changed during an iteration.
-----Original Message----- From: Benson Margulies [mailto:[email protected]] Sent: Thursday, May 12, 2011 9:14 AM To: [email protected] Subject: Re: AW: Incremental clustering Is the idea here that you are going to be presented with many different corpora that have some sort of overall resemblance, so that priors derived from the first N speed up clustering N+1? --benson
