Python custom partitioning

2016-01-11 Thread Dženan Softić
Hi, I am trying to implement Gap statistics on Spark, which aims to determine actual number of clusters for KMeans. Giving the range of possible Ks (e.g. K = 10), Gap will run KMeans for each K in the range. Since computation of Kmeans for K=10 takes more time than K=1,2,3,4 together, I would

BIRCH clustering algorithm

2015-12-14 Thread Dženan Softić
Hi, As a part of the project, we are trying to create parallel implementation of BIRCH clustering algorithm [1]. We are mostly getting idea how to do it from this paper, which used CUDA to make BIRCH parallel [2]. ([2] is short paper, just section 4. is relevant). We would like to implement