Re: kmeans vectors

Jeff Eastman Thu, 30 Sep 2010 06:19:06 -0700

Not using the synthetic control jobs. They always run Canopy over theconverted data and you need to choose t1 and t2 to get the initial k.Once you have run it once; however, copy the data file from output intoanother folder. From there you can run k-means or any of the otherclustering programs on that data using their normal jobs and normalparameters.

When you run k-means on the data, you can supply a -k argument and yourinput points will be randomly-sampled to prime the initial clustercenters for the subsequent iterations.

I'm going to move the InputDriver and Mapper to utils since it hasgeneral utility outside of the synthetic control example. Its driver canbe run directly from the command line and you can do that too.


Smooth sailing,
Jeff


On 9/30/10 1:40 AM, Lahiru Samarakoon wrote:

Hi Jeff,

If we do this for Kmeans, How can we specify the k (number of clusters) and
initial seeds for the algorithm?

I understand that canopy is used for this.

Does Mahout has the flexibility to use Kmeans/Fuzzy Kmeans independent of
Canopy by inputing k and initial seeds externally?

Thanks,
Lahiru

Re: kmeans vectors

Reply via email to