I am still stuck at this problem. Can anyone give me a heads-up on how existing systems handle this? If a collection of documents is modified, is the clustering recomputed from scratch each time? Or is there in fact any incremental way to handle an evolving set of documents?
I would really appreciate any hint! Thanks, David Am 09.05.2011 um 12:45 schrieb Ulrich Poppendieck: > Not an answer, but a follow-up question: > I would be interested in the very same thing, but with the possibility to > assign new sites to existing clusters OR to new ones. > > Thanks in advance, > Ulrich > > -----Ursprüngliche Nachricht----- > Von: David Saile [mailto:[email protected]] > Gesendet: Montag, 9. Mai 2011 11:53 > An: [email protected] > Betreff: Incremental clustering > > Hi list, > > I am completely new to Mahout, so please forgive me if the answer to my > question is too obvious. > > For a case study, I am working on a simple incremental web crawler (much like > Nutch) and I want to include a very simple indexing step that incorporates > clustering of documents. > > I was hoping to use some kind of incremental clustering algorithm, in order > to make use of the incremental way the crawler is supposed to work (i.e. > continuously adding and updating websites). > > Is there some way to achieve the following: > 1) initial clustering of the first web-crawl > 2) assigning new sites to existing clusters > 3) possibly moving modified sites between clusters > > I would really appreciate any help! > > Thanks, > David
