Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20396#discussion_r164112835
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala
---
@@ -111,6 +129,46 @@ object ClusteringEvaluator
}
+private[evaluation] abstract class Silhouette {
+
+ /**
+ * It computes the Silhouette coefficient for a point.
+ */
+ def pointSilhouetteCoefficient(
+ clusterIds: Set[Double],
+ pointClusterId: Double,
+ pointClusterNumOfPoints: Long,
+ averageDistanceToCluster: (Double) => Double): Double = {
+ // Here we compute the average dissimilarity of the current point to
any cluster of which the
+ // point is not a member.
+ // The cluster with the lowest average dissimilarity - i.e. the
nearest cluster to the current
+ // point - s said to be the "neighboring cluster".
+ val otherClusterIds = clusterIds.filter(_ != pointClusterId)
+ val neighboringClusterDissimilarity =
otherClusterIds.map(averageDistanceToCluster).min
+
+ // adjustment for excluding the node itself from the computation of
the average dissimilarity
+ val currentClusterDissimilarity = if (pointClusterNumOfPoints == 1) {
+ 0
+ } else {
+ averageDistanceToCluster(pointClusterId) * pointClusterNumOfPoints /
+ (pointClusterNumOfPoints - 1)
+ }
+
+ (currentClusterDissimilarity compare
neighboringClusterDissimilarity).signum match {
--- End diff --
Is this just expressing ...
```
if (currentClusterDissimilarity < neighboringClusterDissimilarity) {
...
} else if (currentClusterDissimilarity > neighboringClusterDissimilarity) {
} else {
...
}
```
That seems more straightforward if that's all it is, to my eyes. This has
postfix notation, signum, match statement
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]