On Aug 25, 2010, at 9:32am, Young wrote:

I am using the mahout to cluster the news and I could see the top words for each cluster. But I am very keen to know how to define a topic for each cluster? Do we have to hardcore the topic for the cluster?

I find an interesting sitehttp://search.carrot2.org/stable/search and they make excellent topics clustering based on the page content.

You can use Carrot2 to generate labels for clusters, but in my experience it has issues with the size of individual elements in the dataset is large. Carrot2 is optimized for clustering/labeling search results, and seems to key off the phrases found in titles of web pages and search summaries.

Next I was going to try to derive SIPs (statistically improbable phrases) from documents in the cluster, but we ran out of time on that project.

-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g




Reply via email to