Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11119#discussion_r82215606
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
    @@ -303,6 +312,10 @@ class KMeans @Since("1.5.0") (
       @Since("1.5.0")
       def setSeed(value: Long): this.type = set(seed, value)
     
    +  /** @group setParam */
    +  @Since("2.1.0")
    +  def setInitialModel(value: KMeansModel): this.type = set(initialModel, 
value)
    --- End diff --
    
    There was some discussion on this in this PR (it was in March :). IF the 
above is the desired behavior, we still need to check that `k` and the initial 
model line up since you can set the initial model, and then set `k`. I tested 
it and an error still gets thrown, but it's thrown by the mllib KMeans instead. 
We should check it in ML explicitly. I prefer the following behavior:
    
    * If `isSet(initialModel && isSet(k)` then check that they are equal at 
train time and throw an error if not
    * if `isSet(initialModel) && !isSet(k)` then set k to the initial model k 
at train time (can log a warning maybe)
    
    Actually, the current behavior is essentially equivalent. But, we still 
need a test to check that an error is thrown when the two mismatch, and we need 
to check that case inside of the train method still.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to