Yeah, so that is the other thing: The fact that text being so high dimensional..Wouldn't projecting it into an infinite dimensional vector space be of limited utility then?
On Jul 14, 2011, at 11:19 AM, Hector Yee wrote: > Same reason you would use kernels instead of linear for SVMs... you can get > more separation in a different space. > But text is already so high dimensional... > > On Thu, Jul 14, 2011 at 11:14 AM, Eshwaran Vijaya Kumar < > [email protected]> wrote: > >> Assuming the OP was doing cosine similarity (as is commonly done with text) >> while clustering, wouldn't that implicitly imply the use of a Kernel ? Would >> using a separate kernel help? >> >> On Jul 14, 2011, at 6:56 AM, Hector Yee wrote: >> >>> The histogram intersection kernel would work well and it has no >> parameters >>> >>> Sent from my iPad >>> >>> On Jul 14, 2011, at 2:38 AM, Vckay <[email protected]> wrote: >>> >>>> I am clustering some real world text data using K-Means. I recently came >>>> across Kernel K-Means and wanted to know if someone who has had >> experience >>>> with Kernels could comment on their appropriateness for text data, i.e, >>>> Would using a Kernel boost k-means quality? ( I know this is rather >> general >>>> but it is sort of hard to figure out if my high dimensional real world >> data >>>> is linearly separable.) If so, are there any Kernel's with "practically >>>> accepted" parameters? >>>> >>>> Thanks >>>> VC >> >> > > > -- > Yee Yang Li Hector > http://hectorgon.blogspot.com/ (tech + travel) > http://hectorgon.com (book reviews)
