Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22764#discussion_r226668414 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala --- @@ -225,13 +227,14 @@ object BisectingKMeansModel extends Loader[BisectingKMeansModel] { assert(formatVersion == thisFormatVersion) val rootId = (metadata \ "rootId").extract[Int] val distanceMeasure = (metadata \ "distanceMeasure").extract[String] + val trainingCost = (metadata \ "trainingCost").extract[Double] --- End diff -- Thank you all for the comments and sorry for the late answer. Just a couple of notes on your comments @WeichenXu123 (I may be missing something, so please correct me if I am wrong): - I checked the `ml.clustering.KMeansModel.KMeansModelReader` and it doesn't store anything related to the summary. Summary is not recovered after load of the model, so I don't see any reason why we should modify the read/load of `ml.clustering.BisectingKMeansModel.BisectingKMeansModelReader`; - this model can read from previous versions, since this is version "2.0", which was introduced for Spark 2.4; for previous versions, we read/write version "1.0"; the version check method for versioning is used only for the `ml` package, not in `mllib` where we have this versioning approach; > I was told that this change just follows what we did for other models before. @cloud-fan Yes, let me link the PR for KMeans doing the same, which is: https://github.com/apache/spark/pull/20629. Just a final comment which I hope clarifies which is the source of the confusion here and the reason of the above comments by @viirya and @WeichenXu123: `trainingCost` here is a member of the `summary`, not a parameter of the model for the `ml.clustering.BisectingKMeansModel`. Instead, it is a member of the model for `mllib.clustering.BisectingKMeansModel` (we have no summary notion there...). So storing it for `mllib` is needed in order for the model read after persisting it being the same of the original one (I think it doesn't pass UTs otherwise). Storing it for the `ml`, instead, it is not needed, because the summary is not persisted. If we want to persist also the summary for `ml` package I think we should best create a separate JIRA and PR for it. Hope this clarifies (sorry for being so verbose).
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org