Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/20629
Right - so while itâs perhaps a lower quality metric it is different. So I
wonder if deprecation is the right approach (vs say putting the within
cluster sum squares into ClusteringEvaluator).
On Sun, 18 Feb 2018 at 20:35, Marco Gaido <[email protected]> wrote:
> thanks for taking a look at this @MLnick <https://github.com/mlnick>. No,
> it doesn't, in the sense that it returns a different result: this is the
> sum of the squared euclidean distance between a point and the centroid of
> the cluster it is assigned to, while the silhouette metric is the average
> of the silhouette coefficient. So they are completely different formulas.
>
> The semantic is a bit different too. Silhouette measures both cohesion and
> separation of the clusters, while computeCost as it is measures only
> cohesion.
>
> Nonetheless, of course both them can be used to evaluate the result of a
> clustering algorithm, even though the silhouette is much better for this
> purpose.
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/20629#issuecomment-366536722>, or
mute
> the thread
>
<https://github.com/notifications/unsubscribe-auth/AA_SB3VPksj5f9QN4Zo4v16_15YCsQdsks5tWG2MgaJpZM4SIn8J>
> .
>
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]