[GitHub] spark pull request #20600: [SPARK-23412][ML] Add cosine distance to Bisectin...

mgaido91 Thu, 01 Mar 2018 09:33:07 -0800

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20600#discussion_r171634107
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeansModel.scala
 ---
    @@ -155,34 +183,55 @@ object BisectingKMeansModel extends 
Loader[BisectingKMeansModel] {
           spark.createDataFrame(data).write.parquet(Loader.dataPath(path))
         }
     
    -    private def getNodes(node: ClusteringTreeNode): 
Array[ClusteringTreeNode] = {
    -      if (node.children.isEmpty) {
    -        Array(node)
    -      } else {
    -        node.children.flatMap(getNodes(_)) ++ Array(node)
    -      }
    -    }
    -
    -    def load(sc: SparkContext, path: String, rootId: Int): 
BisectingKMeansModel = {
    +    def load(sc: SparkContext, path: String): BisectingKMeansModel = {
    --- End diff --
    
    yes, but this is the load method of the object `SaveLoadV1_0` which is 
marked as `private[clustering]`. The real `load` method has no change in the 
signature, so I don't think this is a problem.
    
    I think that this change can't be avoided. If we don't update this and the 
user happens to use the `mllib` implementation, instead of `ml`, what happens 
is that he/she can set the distance measure successfully, but if he/she saves 
the model and loads it, this information is lost and it will default to the 
euclidean distance measure.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20600: [SPARK-23412][ML] Add cosine distance to Bisectin...

Reply via email to