Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21218#discussion_r186926118
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -278,6 +279,7 @@ class BisectingKMeans @Since("2.0.0") (
val summary = new BisectingKMeansSummary(
model.transform(dataset), $(predictionCol), $(featuresCol), $(k))
model.setSummary(Some(summary))
+ instr.logNamedValue("clusterSizes", summary.clusterSizes.mkString(",
"))
--- End diff --
This requires parsing the result string if we want to do some analysis. How
do we do this in other places? Do we always log in JSON format, e.g.? cc:
@WeichenXu123 @jkbradley
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]