Not too sure what you mean by "raw text data", I am doing the usual: remove
stop words, stem etc and then computing TF-IDF vectors before trying to
cluster them.


2011/7/14 Fernando Fernández <[email protected]>

> Hi vcaky,
>
> Are you using raw text data with k-means? It's usual to obtain some lower
> dimension and dense representation of the documents using Singular Value
> Decomposition and such techniques, and working with that representation
> instead. You may want to take a look at SVD algorithms in mahout.
>
> Best,
> Fernando.
>
> 2011/7/14 Vckay <[email protected]>
>
> > I am clustering some real world text data using K-Means. I recently came
> > across Kernel K-Means and wanted to know if someone who has had
> experience
> > with Kernels could comment on their appropriateness for text data, i.e,
> > Would using a Kernel boost k-means quality? ( I know this is rather
> general
> > but it is sort of hard to figure out if my high dimensional real world
> data
> > is linearly separable.) If so, are there any Kernel's with "practically
> > accepted" parameters?
> >
> > Thanks
> > VC
> >
>

Reply via email to