Hi all, In my team, we are currently developing a fork of spark MLlib extending K-means method such that it is possible to set its own distance function. In this implementation, it could be possible to directly pass, in argument of the K-means train function, a distance function whose signature is: (VectorWithNorm, VectorWithNorm) => Double.
We have found the Jira instance SPARK-11665 proposing to support new distances in bisecting K-means. There has also been the Jira instance SPARK-3219 proposing to add Bregman divergences as distance functions, but it has not been added to MLlib. Therefore, we are wondering if such an extension of MLlib K-means algorithm would be appreciated by the community and would have chances to get included in future spark releases. Regards, Simon Nanty