For kmeans you need a distance measure.
If you can build a custom distance measure that is consistent with your
definition of centroid, you can run kmeans using it.
I am unsure whether such a measure exists - by your definition, two binary
vectors that are opposite each other can be in the same centroid.
However, take a look at the TanimotoDistanceMeasure - It might help.
Good luck,
-- Yuval

On Tue, Jul 17, 2012 at 7:56 AM, Masoud Moshref Javadi <[email protected]>wrote:

>
> I want to run kmeans on binary data and the definition of centroid for
> my application is the Or() of bits of all points inside a cluster.
> Where, in Mahout, should I change?
>
> --
> Masoud Moshref Javadi
> Computer Engineering PhD Student
> Ming Hsieh Department of Electrical Engineering
> University of Southern California
>
>
>
>
>

Reply via email to