You can do agglomerative clustering incrementally, deciding at each point where to put it. Then you have to decide whether, on some schedule or another, to consider 'rebalancing' by moving things around.
On Thu, May 12, 2011 at 4:53 AM, David Saile <[email protected]> wrote: > I am still stuck at this problem. > > Can anyone give me a heads-up on how existing systems handle this? > If a collection of documents is modified, is the clustering recomputed from > scratch each time? > Or is there in fact any incremental way to handle an evolving set of > documents? > > I would really appreciate any hint! > > Thanks, > David > > > Am 09.05.2011 um 12:45 schrieb Ulrich Poppendieck: > >> Not an answer, but a follow-up question: >> I would be interested in the very same thing, but with the possibility to >> assign new sites to existing clusters OR to new ones. >> >> Thanks in advance, >> Ulrich >> >> -----Ursprüngliche Nachricht----- >> Von: David Saile [mailto:[email protected]] >> Gesendet: Montag, 9. Mai 2011 11:53 >> An: [email protected] >> Betreff: Incremental clustering >> >> Hi list, >> >> I am completely new to Mahout, so please forgive me if the answer to my >> question is too obvious. >> >> For a case study, I am working on a simple incremental web crawler (much >> like Nutch) and I want to include a very simple indexing step that >> incorporates clustering of documents. >> >> I was hoping to use some kind of incremental clustering algorithm, in order >> to make use of the incremental way the crawler is supposed to work (i.e. >> continuously adding and updating websites). >> >> Is there some way to achieve the following: >> 1) initial clustering of the first web-crawl >> 2) assigning new sites to existing clusters >> 3) possibly moving modified sites between clusters >> >> I would really appreciate any help! >> >> Thanks, >> David > >
