Re: Incremental Clustering from Text Data

John White Thu, 16 Jan 2014 10:08:40 -0800

Hi,

Clarifying my question a little bit:


How can I create a vector from a single text document to conform the schema
of the collection of vectors that I created using seq2sparse before?
I want to use it to find the closest centroid to a text document that is
submitted by a client

Best


2014/1/16 John White <[email protected]>

> Hello,
> I use seq2sparse with -wt tfidf option and execute the kmeans pipeline. If
> new data comes at a later date, should I decide which cluster it belongs
> using "Listing 9.4 News clustering using canopy generation and k-means
> clustering" in "Mahout in Action", or is there a better/more generic (i.e.
> that can work with other algorithms using text input) way. Specifically I
> need a way to access the dictionary and tfidf of the training set data when
> testing incrementally.
>

Re: Incremental Clustering from Text Data

Reply via email to