Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20600#discussion_r171634107
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala
---
@@ -155,34 +183,55 @@ object BisectingKMeansModel extends
Loader[BisectingKMeansModel] {
spark.createDataFrame(data).write.parquet(Loader.dataPath(path))
}
- private def getNodes(node: ClusteringTreeNode):
Array[ClusteringTreeNode] = {
- if (node.children.isEmpty) {
- Array(node)
- } else {
- node.children.flatMap(getNodes(_)) ++ Array(node)
- }
- }
-
- def load(sc: SparkContext, path: String, rootId: Int):
BisectingKMeansModel = {
+ def load(sc: SparkContext, path: String): BisectingKMeansModel = {
--- End diff --
yes, but this is the load method of the object `SaveLoadV1_0` which is
marked as `private[clustering]`. The real `load` method has no change in the
signature, so I don't think this is a problem.
I think that this change can't be avoided. If we don't update this and the
user happens to use the `mllib` implementation, instead of `ml`, what happens
is that he/she can set the distance measure successfully, but if he/she saves
the model and loads it, this information is lost and it will default to the
euclidean distance measure.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]