I think mahout-core ( and its internal dependencies ) can do most of what you need.

You will have to create your vectors yourself and write to HDFS.

Then use KMeansDriver's run method to do clustering.

Then use ClusterOutputPostProcessor to separate out vectors belonging to different clusters in their specific directories.

Then write some code to read the cluster specific clusters.

PS : Reading and writing from HDFS is simple.

On 03-01-2012 15:57, rahul raghavendhra wrote:
I am new to mahout, i have svn the trunk and installed it using mvn.. now i
wish to write a java program(instead of the shell script
build-reuters.sh/cluster-reuters.sh) that performs a kmeans clustering by
calling the methods or by creating instance (if possible) in the classes
which convert the dataset into sequence file then to sparse and then apply
kmeans   and then cluster dump and display kmeans..

i.e , to perform kmeans clustering on reuters dataset without using
./mahout<seqdirectory | seq2sparse | kmeans| clusterdump>

what r the jars needed to be imported..

Can that program can be developed using eclipse and run it there ?

kindly help, what are the steps to follow ?

thanks in advance

./rahul



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1416 / Virus Database: 2109/4119 - Release Date: 01/02/12

Reply via email to