I am still stuck at this problem.

Can anyone give me a heads-up on how existing systems handle this? 
If a collection of documents is modified, is the clustering recomputed from 
scratch each time? 
Or is there in fact any incremental way to handle an evolving set of documents?

I would really appreciate any hint!

Thanks,
David


Am 09.05.2011 um 12:45 schrieb Ulrich Poppendieck:

> Not an answer, but a follow-up question: 
> I would be interested in the very same thing, but with the possibility to 
> assign new sites to existing clusters OR to new ones.
> 
> Thanks in advance,
> Ulrich
> 
> -----Ursprüngliche Nachricht-----
> Von: David Saile [mailto:[email protected]] 
> Gesendet: Montag, 9. Mai 2011 11:53
> An: [email protected]
> Betreff: Incremental clustering
> 
> Hi list,
> 
> I am completely new to Mahout, so please forgive me if the answer to my 
> question is too obvious.
> 
> For a case study, I am working on a simple incremental web crawler (much like 
> Nutch) and I want to include a very simple indexing step that incorporates 
> clustering of documents.
> 
> I was hoping to use some kind of incremental clustering algorithm, in order 
> to make use of the incremental way the crawler is supposed to work (i.e. 
> continuously adding and updating websites).
> 
> Is there some way to achieve the following:   
>       1) initial clustering of the first web-crawl
>       2) assigning new sites to existing clusters
>       3) possibly moving modified sites between clusters
> 
> I would really appreciate any help!
> 
> Thanks,
> David

Reply via email to