I tried to use -k with the syntheticcontrol.kmeans.Job program, but it
didn't recognize that argument.

On Thu, Sep 30, 2010 at 6:18 AM, Jeff Eastman
<[email protected]> wrote:
>  Not using the synthetic control jobs. They always run Canopy over the
> converted data and you need to choose t1 and t2 to get the initial k. Once
> you have run it once; however, copy the data file from output into another
> folder. From there you can run k-means or any of the other clustering
> programs on that data using their normal jobs and normal parameters.
>
> When you run k-means on the data, you can supply a -k argument and your
> input points will be randomly-sampled to prime the initial cluster centers
> for the subsequent iterations.
>
> I'm going to move the InputDriver and Mapper to utils since it has general
> utility outside of the synthetic control example. Its driver can be run
> directly from the command line and you can do that too.
>
> Smooth sailing,
> Jeff
>
>
> On 9/30/10 1:40 AM, Lahiru Samarakoon wrote:
>>
>> Hi Jeff,
>>
>> If we do this for Kmeans, How can we specify the k (number of clusters)
>> and
>> initial seeds for the algorithm?
>>
>> I understand that canopy is used for this.
>>
>> Does Mahout has the flexibility to use Kmeans/Fuzzy Kmeans independent of
>> Canopy by inputing k and initial seeds externally?
>>
>> Thanks,
>> Lahiru
>>
>
>



-- 
Have you thanked a teacher today? ---> http://www.liftateacher.org

Reply via email to