Re: Possible contribution to MLlib

2016-06-21 Thread Jeff Zhang
I think it is valuable to make the distance function pluggable and also
provide some builtin distance function. This might be also useful for other
algorithms besides KMeans.

On Tue, Jun 21, 2016 at 7:48 PM, Simon NANTY 
wrote:

> Hi all,
>
>
>
> In my team, we are currently developing a fork of spark MLlib extending
> K-means method such that it is possible to set its own distance function.
> In this implementation, it could be possible to directly pass, in argument
> of the K-means train function, a distance function whose signature is:
> (VectorWithNorm, VectorWithNorm) => Double.
>
>
>
> We have found the Jira instance SPARK-11665 proposing to support new
> distances in bisecting K-means. There has also been the Jira instance
> SPARK-3219 proposing to add Bregman divergences as distance functions, but
> it has not been added to MLlib. Therefore, we are wondering if such an
> extension of MLlib K-means algorithm would be appreciated by the community
> and would have chances to get included in future spark releases.
>
>
>
> Regards,
>
>
>
> Simon Nanty
>
>
>



-- 
Best Regards

Jeff Zhang


Possible contribution to MLlib

2016-06-21 Thread Simon NANTY
Hi all,

In my team, we are currently developing a fork of spark MLlib extending K-means 
method such that it is possible to set its own distance function. In this 
implementation, it could be possible to directly pass, in argument of the 
K-means train function, a distance function whose signature is: 
(VectorWithNorm, VectorWithNorm) => Double.

We have found the Jira instance SPARK-11665 proposing to support new distances 
in bisecting K-means. There has also been the Jira instance SPARK-3219 
proposing to add Bregman divergences as distance functions, but it has not been 
added to MLlib. Therefore, we are wondering if such an extension of MLlib 
K-means algorithm would be appreciated by the community and would have chances 
to get included in future spark releases.

Regards,

Simon Nanty