Sorry, i think more commonly if aggregating transpose is to be used, then cenroid assignments are better be the key of the matrix D (so D:= A) and aggregating transpose is performed on a matrix (1 | D)' (i.e., 1 cbind D).t so that the first row of result contains counts of cluster points and we can finish up cluster assignment via
M = (1 | D)' C = M(:,2:) with each row hadamard-divided by first row of counts M(:,1) (implying Golub-Van Loan notations for subblocking) On Wed, Mar 29, 2017 at 9:02 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > the simplest scheme is to initialize distributed matrix of the shape D := > (0 | A) where A is your dataset and 0 is a single column indicating current > centroid assignment and distribute current centroid matrix C via matrix > broadcast (assuming there are few enough centers). > > Then alternatively run cluster assignment within mapBlock() operator on D > with recomputation of new centroids C afterwards. Recomputation of > centroids can be done via aggregating transpose. > > of course a better scheme includes pre-sketching (k-means ||) and use of a > triangle inequality during recomputations. > > On Wed, Mar 29, 2017 at 8:30 AM, KHATWANI PARTH BHARAT < > h2016...@pilani.bits-pilani.ac.in> wrote: > >> Sir, >> I am trying to write the kmeans clustering algorithm using Mahout Samsara >> but i am bit confused >> about how to leverage Distributed Row Matrix for the same. Can anybody >> help >> me with same. >> >> >> >> >> >> Thanks >> Parth Khatwani >> > >