Re: How to define a topic for cluster.

Jeff Eastman Wed, 25 Aug 2010 10:02:38 -0700

 Hi Young,

You did not mention what part(s) of Mahout you are using but I willassume the clustering code. LDA is designed to deduce a set of topicsfrom a corpus of documents and does not require or allow the topics tobe predefined. Some of the other clustering algorithms (e.g. k-Means,Fuzzy k-Means, Dirichlet) can be initialized with a set of topics(clusters), but after the iterations these will likely have changedsignificantly. K-Means can also be initialized by running Canopy overyour dataset but there is no hard-coding required by any Mahoutclustering. Once you have developed a set of topics (generally anoffline, batch process) you can use one of the clusteringimplementations to quickly cluster new documents using those topics.

Of course, if you really want to use predefined topics then you shouldlook at some of the classification algorithms which can be trained tosort your news articles on the fly.


Jeff


On 8/25/10 9:32 AM, Young wrote:

Hi all,
I am using the mahout to cluster the news and I could see the top words for 
each cluster. But I am very keen to know how to define a topic for each 
cluster? Do we have to hardcore the topic for the cluster?

I find an interesting sitehttp://search.carrot2.org/stable/search and they make 
excellent topics clustering based on the page content.

Thank you very much.

--Young

Re: How to define a topic for cluster.

Reply via email to