Hi, I'd like to implement k-means by myself, in the following naive way: Given a large set of vectors:
1. Generate k random centers from set. 2. Mapper reads all center and a split of the vectors set and emits for each vector the closest center as a key. 3. Reducer calculated new center and writes it. 4. Goto step 2 until no change in the centers. My question is very basic: how do I distribute all the new centers (produced by the reducers) to all the mappers? I can't use distributed cache since its read-only. I can't use the context.write since it will create a file for each reduce task, and I need a single file. The more general issue here is how to distribute data produced by reducer to all the mappers? Thanks.
