Hi, Clarifying my question a little bit:
How can I create a vector from a single text document to conform the schema of the collection of vectors that I created using seq2sparse before? I want to use it to find the closest centroid to a text document that is submitted by a client Best 2014/1/16 John White <[email protected]> > Hello, > I use seq2sparse with -wt tfidf option and execute the kmeans pipeline. If > new data comes at a later date, should I decide which cluster it belongs > using "Listing 9.4 News clustering using canopy generation and k-means > clustering" in "Mahout in Action", or is there a better/more generic (i.e. > that can work with other algorithms using text input) way. Specifically I > need a way to access the dictionary and tfidf of the training set data when > testing incrementally. >
