Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19676#discussion_r155928871
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -51,9 +52,14 @@ public static void main(String[] args) {
KMeans kmeans = new KMeans().setK(2).setSeed(1L);
KMeansModel model = kmeans.fit(dataset);
- // Evaluate clustering by computing Within Set Sum of Squared Errors.
- double WSSSE = model.computeCost(dataset);
- System.out.println("Within Set Sum of Squared Errors = " + WSSSE);
+ // Make predictions
+ Dataset<Row> predictions = model.transform(dataset);
+
+ // Evaluate clustering by computing Silhouette score
+ ClusteringEvaluator evaluator = new ClusteringEvaluator();
+
+ double silhouette = evaluator.evaluate(predictions);
+ System.out.println("Silhouette with squared euclidean distance = " +
silhouette);
--- End diff --
euclidean -> Euclidean, but not important to change unless you're touching
the code again anyway
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]