Incremental Clustering from Text Data

John White Thu, 16 Jan 2014 00:04:38 -0800

Hello,
I use seq2sparse with -wt tfidf option and execute the kmeans pipeline. If
new data comes at a later date, should I decide which cluster it belongs
using "Listing 9.4 News clustering using canopy generation and k-means
clustering" in "Mahout in Action", or is there a better/more generic (i.e.
that can work with other algorithms using text input) way. Specifically I
need a way to access the dictionary and tfidf of the training set data when
testing incrementally.

Incremental Clustering from Text Data

Reply via email to