Hi vcaky, Are you using raw text data with k-means? It's usual to obtain some lower dimension and dense representation of the documents using Singular Value Decomposition and such techniques, and working with that representation instead. You may want to take a look at SVD algorithms in mahout.
Best, Fernando. 2011/7/14 Vckay <[email protected]> > I am clustering some real world text data using K-Means. I recently came > across Kernel K-Means and wanted to know if someone who has had experience > with Kernels could comment on their appropriateness for text data, i.e, > Would using a Kernel boost k-means quality? ( I know this is rather general > but it is sort of hard to figure out if my high dimensional real world data > is linearly separable.) If so, are there any Kernel's with "practically > accepted" parameters? > > Thanks > VC >
