Hi,
I am trying to implement Gap statistics on Spark, which aims to determine
actual number of clusters for KMeans. Giving the range of possible Ks (e.g.
K = 10), Gap will run KMeans for each K in the range. Since computation of
Kmeans for K=10 takes more time than K=1,2,3,4 together, I would
Hi,
As a part of the project, we are trying to create parallel implementation
of BIRCH clustering algorithm [1]. We are mostly getting idea how to do it
from this paper, which used CUDA to make BIRCH parallel [2]. ([2] is short
paper, just section 4. is relevant).
We would like to implement