Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/20629
  
    Right - so while it’s perhaps a lower quality metric it is different. So I
    wonder if deprecation is the right approach (vs say putting the within
    cluster sum squares into ClusteringEvaluator).
    
    On Sun, 18 Feb 2018 at 20:35, Marco Gaido <[email protected]> wrote:
    
    > thanks for taking a look at this @MLnick <https://github.com/mlnick>. No,
    > it doesn't, in the sense that it returns a different result: this is the
    > sum of the squared euclidean distance between a point and the centroid of
    > the cluster it is assigned to, while the silhouette metric is the average
    > of the silhouette coefficient. So they are completely different formulas.
    >
    > The semantic is a bit different too. Silhouette measures both cohesion and
    > separation of the clusters, while computeCost as it is measures only
    > cohesion.
    >
    > Nonetheless, of course both them can be used to evaluate the result of a
    > clustering algorithm, even though the silhouette is much better for this
    > purpose.
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/20629#issuecomment-366536722>, or 
mute
    > the thread
    > 
<https://github.com/notifications/unsubscribe-auth/AA_SB3VPksj5f9QN4Zo4v16_15YCsQdsks5tWG2MgaJpZM4SIn8J>
    > .
    >



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to