Hi, I have around 200 datasets in CSV format. I wish to run clustering on them independently to identify the centroids for each individual dataset, that is treating each of the dataset as a new clustering problem.
Also each dataset has unknown number of means/centroids. ( Use Canopy clustering to identify ?? ) Could someone guide me in the direction to setup the experiments to efficiently cluster the datasets as I am a beginner to mahout. (I have setup mahout+hadoop as a single node setup on an ubuntu machine.) Regards, Jayendra Rakesh Yeka.