Hi list,
I am completely new to Mahout, so please forgive me if the answer to my
question is too obvious.
For a case study, I am working on a simple incremental web crawler (much like
Nutch) and I want to include a very simple indexing step that incorporates
clustering of documents.
I was hoping to use some kind of incremental clustering algorithm, in order to
make use of the incremental way the crawler is supposed to work (i.e.
continuously adding and updating websites).
Is there some way to achieve the following:
1) initial clustering of the first web-crawl
2) assigning new sites to existing clusters
3) possibly moving modified sites between clusters
I would really appreciate any help!
Thanks,
David