Not using the synthetic control jobs. They always run Canopy over the
converted data and you need to choose t1 and t2 to get the initial k.
Once you have run it once; however, copy the data file from output into
another folder. From there you can run k-means or any of the other
clustering programs on that data using their normal jobs and normal
parameters.
When you run k-means on the data, you can supply a -k argument and your
input points will be randomly-sampled to prime the initial cluster
centers for the subsequent iterations.
I'm going to move the InputDriver and Mapper to utils since it has
general utility outside of the synthetic control example. Its driver can
be run directly from the command line and you can do that too.
Smooth sailing,
Jeff
On 9/30/10 1:40 AM, Lahiru Samarakoon wrote:
Hi Jeff,
If we do this for Kmeans, How can we specify the k (number of clusters) and
initial seeds for the algorithm?
I understand that canopy is used for this.
Does Mahout has the flexibility to use Kmeans/Fuzzy Kmeans independent of
Canopy by inputing k and initial seeds externally?
Thanks,
Lahiru