GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/19340
[SPARK-22119] Add cosine distance to KMeans
## What changes were proposed in this pull request?
Currently, KMeans assumes the only possible distance measure to be used is
the Euclidean. This PR aims to add the cosine distance support to the KMeans
algorithm.
## How was this patch tested?
existing and added UTs.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mgaido91/spark SPARK-22119
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19340.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19340
----
commit d679adde78426d5517a15640d632ef0588a6b249
Author: Marco Gaido <[email protected]>
Date: 2017-09-11T14:43:53Z
Add the distanceSuite parameter and the cosine distance impl to mllib.KMeans
commit c364ae3c66740cbbfe763f7e4fd10a8abd02ced8
Author: Marco Gaido <[email protected]>
Date: 2017-09-25T12:07:17Z
Add distance measure to ml Kmeans
commit 0e2a9ee48c46f51f55b09f9d60d47ac3325676bd
Author: Marco Gaido <[email protected]>
Date: 2017-09-25T14:40:37Z
Add tests for cosine
commit d8d8c642345b6a7815998ab5409a5e4cd86d9a5d
Author: Marco Gaido <[email protected]>
Date: 2017-09-25T15:05:32Z
fix scalastyle
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]