Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/20629
Sorry I mean putting the metric in evaluator and then also deprecating
computCost
On Sun, 18 Feb 2018 at 20:41, Nick Pentreath <[email protected]>
wrote:
> Right - so while itâs perhaps a lower quality metric it is different.
So I
> wonder if deprecation is the right approach (vs say putting the within
> cluster sum squares into ClusteringEvaluator).
>
> On Sun, 18 Feb 2018 at 20:35, Marco Gaido <[email protected]>
> wrote:
>
>> thanks for taking a look at this @MLnick <https://github.com/mlnick>.
>> No, it doesn't, in the sense that it returns a different result: this is
>> the sum of the squared euclidean distance between a point and the
centroid
>> of the cluster it is assigned to, while the silhouette metric is the
>> average of the silhouette coefficient. So they are completely different
>> formulas.
>>
>> The semantic is a bit different too. Silhouette measures both cohesion
>> and separation of the clusters, while computeCost as it is measures only
>> cohesion.
>>
>> Nonetheless, of course both them can be used to evaluate the result of a
>> clustering algorithm, even though the silhouette is much better for this
>> purpose.
>>
>> â
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <https://github.com/apache/spark/pull/20629#issuecomment-366536722>, or
mute
>> the thread
>>
<https://github.com/notifications/unsubscribe-auth/AA_SB3VPksj5f9QN4Zo4v16_15YCsQdsks5tWG2MgaJpZM4SIn8J>
>> .
>>
>
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]