It's the Voronoi algo more or less verbatim which I initially read as an 
alternative implementation of PAM as opposed to an alternative to PAM. I'll add 
the following bits to it and then put it up on Github if it piques an interest:

  1.  I'll treat the distance calculation as an adverb to avoid having to 
instantiate the whole distance matrix (which is never used in its entirety).  
This will give it better space complexity. I'll try to make it so that it could 
be run on say 1M data points.
  2.  I'll add kmeans++ initialisation which should perform much better than 
random.
  3.  I'll use a Gaussian mixture model with spherical variance parameterised 
by the medoids and sample variance in-cluster as a working model to assign 
likelihood to fitted data.
  4.  I'll add repeated initialisations. I.e. repeat clustering N times and 
keep the initialisation with the best likelihood.
  5.  I'll add a k discovery routine which uses AIC and the GMM likelihood to 
find k.
  6.  I'll include benchmarks against kmeans, kmedoids, affinity prop and a 
couple other sklearn implementation against a couple datasets.

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to