Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22764#discussion_r226520698
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala
 ---
    @@ -225,13 +227,14 @@ object BisectingKMeansModel extends 
Loader[BisectingKMeansModel] {
           assert(formatVersion == thisFormatVersion)
           val rootId = (metadata \ "rootId").extract[Int]
           val distanceMeasure = (metadata \ "distanceMeasure").extract[String]
    +      val trainingCost = (metadata \ "trainingCost").extract[Double]
    --- End diff --
    
    - Could you avoid modifying loading model code in "mllib" package, but 
modifying code in "ml" package, i.e., the class 
`ml.clustering.BisectingKMeansModel.BisectingKMeansModelReader`, you can 
reference the `KMeans` code: `ml.clustering.KMeansModel.KMeansModelReader`.
    
    - And, +1 with @viirya mentioned, we should keep model loading 
compatibility, add a version check (when >= 2.4) then we load "training cost" . 
Note that add these in 
`ml.clustering.BisectingKMeansModel.BisectingKMeansModelReader`.
    
    - And, could you also add version check (when >= 2.4) then we load 
"training cost" into `ml.clustering.KMeansModel.KMeansModelReader` ? 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to