Hi all,

In my team, we are currently developing a fork of spark MLlib extending K-means 
method such that it is possible to set its own distance function. In this 
implementation, it could be possible to directly pass, in argument of the 
K-means train function, a distance function whose signature is: 
(VectorWithNorm, VectorWithNorm) => Double.

We have found the Jira instance SPARK-11665 proposing to support new distances 
in bisecting K-means. There has also been the Jira instance SPARK-3219 
proposing to add Bregman divergences as distance functions, but it has not been 
added to MLlib. Therefore, we are wondering if such an extension of MLlib 
K-means algorithm would be appreciated by the community and would have chances 
to get included in future spark releases.

Regards,

Simon Nanty

Reply via email to