Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/20629
  
    Sorry I mean putting the metric in evaluator and then also deprecating
    computCost
    On Sun, 18 Feb 2018 at 20:41, Nick Pentreath <[email protected]>
    wrote:
    
    > Right - so while it’s perhaps a lower quality metric it is different. 
So I
    > wonder if deprecation is the right approach (vs say putting the within
    > cluster sum squares into ClusteringEvaluator).
    >
    > On Sun, 18 Feb 2018 at 20:35, Marco Gaido <[email protected]>
    > wrote:
    >
    >> thanks for taking a look at this @MLnick <https://github.com/mlnick>.
    >> No, it doesn't, in the sense that it returns a different result: this is
    >> the sum of the squared euclidean distance between a point and the 
centroid
    >> of the cluster it is assigned to, while the silhouette metric is the
    >> average of the silhouette coefficient. So they are completely different
    >> formulas.
    >>
    >> The semantic is a bit different too. Silhouette measures both cohesion
    >> and separation of the clusters, while computeCost as it is measures only
    >> cohesion.
    >>
    >> Nonetheless, of course both them can be used to evaluate the result of a
    >> clustering algorithm, even though the silhouette is much better for this
    >> purpose.
    >>
    >> —
    >> You are receiving this because you were mentioned.
    >> Reply to this email directly, view it on GitHub
    >> <https://github.com/apache/spark/pull/20629#issuecomment-366536722>, or 
mute
    >> the thread
    >> 
<https://github.com/notifications/unsubscribe-auth/AA_SB3VPksj5f9QN4Zo4v16_15YCsQdsks5tWG2MgaJpZM4SIn8J>
    >> .
    >>
    >



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to