Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20396#discussion_r163940257
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala 
---
    @@ -111,6 +129,53 @@ object ClusteringEvaluator
     }
     
     
    +private[evaluation] class Silhouette {
    +
    +  /**
    +   * It computes the Silhouette coefficient for a point.
    +   */
    +  def pointSilhouetteCoefficient(
    +      clusterIds: Set[Double],
    +      pointClusterId: Double,
    +      pointClusterNumOfPoints: Long,
    +      averageDistanceToCluster: (Double) => Double): Double = {
    +    // Here we compute the average dissimilarity of the current point to 
any cluster of which the
    +    // point is not a member.
    +    // The cluster with the lowest average dissimilarity - i.e. the 
nearest cluster to the current
    +    // point - s said to be the "neighboring cluster".
    +    var neighboringClusterDissimilarity = Double.MaxValue
    +    clusterIds.foreach {
    --- End diff --
    
    What about `clusterIds.filter(c != 
pointClusterId).map(averageDistanceToCluster).min` (except that it needs to 
deal with the case that `filter` returns no elements


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to