Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20396#discussion_r163940257
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala
---
@@ -111,6 +129,53 @@ object ClusteringEvaluator
}
+private[evaluation] class Silhouette {
+
+ /**
+ * It computes the Silhouette coefficient for a point.
+ */
+ def pointSilhouetteCoefficient(
+ clusterIds: Set[Double],
+ pointClusterId: Double,
+ pointClusterNumOfPoints: Long,
+ averageDistanceToCluster: (Double) => Double): Double = {
+ // Here we compute the average dissimilarity of the current point to
any cluster of which the
+ // point is not a member.
+ // The cluster with the lowest average dissimilarity - i.e. the
nearest cluster to the current
+ // point - s said to be the "neighboring cluster".
+ var neighboringClusterDissimilarity = Double.MaxValue
+ clusterIds.foreach {
--- End diff --
What about `clusterIds.filter(c !=
pointClusterId).map(averageDistanceToCluster).min` (except that it needs to
deal with the case that `filter` returns no elements
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]