I tried to use -k with the syntheticcontrol.kmeans.Job program, but it didn't recognize that argument.
On Thu, Sep 30, 2010 at 6:18 AM, Jeff Eastman <[email protected]> wrote: > Not using the synthetic control jobs. They always run Canopy over the > converted data and you need to choose t1 and t2 to get the initial k. Once > you have run it once; however, copy the data file from output into another > folder. From there you can run k-means or any of the other clustering > programs on that data using their normal jobs and normal parameters. > > When you run k-means on the data, you can supply a -k argument and your > input points will be randomly-sampled to prime the initial cluster centers > for the subsequent iterations. > > I'm going to move the InputDriver and Mapper to utils since it has general > utility outside of the synthetic control example. Its driver can be run > directly from the command line and you can do that too. > > Smooth sailing, > Jeff > > > On 9/30/10 1:40 AM, Lahiru Samarakoon wrote: >> >> Hi Jeff, >> >> If we do this for Kmeans, How can we specify the k (number of clusters) >> and >> initial seeds for the algorithm? >> >> I understand that canopy is used for this. >> >> Does Mahout has the flexibility to use Kmeans/Fuzzy Kmeans independent of >> Canopy by inputing k and initial seeds externally? >> >> Thanks, >> Lahiru >> > > -- Have you thanked a teacher today? ---> http://www.liftateacher.org
