Not using the synthetic control jobs. They always run Canopy over the converted data and you need to choose t1 and t2 to get the initial k. Once you have run it once; however, copy the data file from output into another folder. From there you can run k-means or any of the other clustering programs on that data using their normal jobs and normal parameters.

When you run k-means on the data, you can supply a -k argument and your input points will be randomly-sampled to prime the initial cluster centers for the subsequent iterations.

I'm going to move the InputDriver and Mapper to utils since it has general utility outside of the synthetic control example. Its driver can be run directly from the command line and you can do that too.

Smooth sailing,
Jeff


On 9/30/10 1:40 AM, Lahiru Samarakoon wrote:
Hi Jeff,

If we do this for Kmeans, How can we specify the k (number of clusters) and
initial seeds for the algorithm?

I understand that canopy is used for this.

Does Mahout has the flexibility to use Kmeans/Fuzzy Kmeans independent of
Canopy by inputing k and initial seeds externally?

Thanks,
Lahiru


Reply via email to