Re: Writing java program for performing kmeans clustering on reuters dataset instead of ./mahout ,Steps to Follow

Paritosh Ranjan Tue, 03 Jan 2012 02:32:51 -0800

I think mahout-core ( and its internal dependencies ) can do most ofwhat you need.


You will have to create your vectors yourself and write to HDFS.


Then use KMeansDriver's run method to do clustering.

Then use ClusterOutputPostProcessor to separate out vectors belonging todifferent clusters in their specific directories.


Then write some code to read the cluster specific clusters.

PS : Reading and writing from HDFS is simple.

On 03-01-2012 15:57, rahul raghavendhra wrote:

I am new to mahout, i have svn the trunk and installed it using mvn.. now i
wish to write a java program(instead of the shell script
build-reuters.sh/cluster-reuters.sh) that performs a kmeans clustering by
calling the methods or by creating instance (if possible) in the classes
which convert the dataset into sequence file then to sparse and then apply
kmeans   and then cluster dump and display kmeans..

i.e , to perform kmeans clustering on reuters dataset without using
./mahout<seqdirectory | seq2sparse | kmeans| clusterdump>

what r the jars needed to be imported..

Can that program can be developed using eclipse and run it there ?

kindly help, what are the steps to follow ?

thanks in advance

./rahul



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1416 / Virus Database: 2109/4119 - Release Date: 01/02/12

Re: Writing java program for performing kmeans clustering on reuters dataset instead of ./mahout ,Steps to Follow

Reply via email to