For kmeans you need a distance measure. If you can build a custom distance measure that is consistent with your definition of centroid, you can run kmeans using it. I am unsure whether such a measure exists - by your definition, two binary vectors that are opposite each other can be in the same centroid. However, take a look at the TanimotoDistanceMeasure - It might help. Good luck, -- Yuval
On Tue, Jul 17, 2012 at 7:56 AM, Masoud Moshref Javadi <[email protected]>wrote: > > I want to run kmeans on binary data and the definition of centroid for > my application is the Or() of bits of all points inside a cluster. > Where, in Mahout, should I change? > > -- > Masoud Moshref Javadi > Computer Engineering PhD Student > Ming Hsieh Department of Electrical Engineering > University of Southern California > > > > >
