Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/11119
  
    @MLnick I'm not sure I understand what you're saying. Where are we 
discarding cluster centers? 
    
    Maybe we should say that the `initialModel` always takes precedence over 
`k`. So we can just ignore `k` when initialModel is set, and log a warning at 
train time that we are ignoring it. There are going to be tradeoffs either way, 
and I think that is reasonable behavior. I vote to ignore `k` when 
`initialModel` is set. That also alleviate DB's concern about the following 
situation (which would fail given the current logic):
    
    ````scala
    val km = new KMeans().setInitialModel(kEquals5Model)
    val model1 = km.fit(df)
    val model2 = km.setInitialModel(kEquals6Model).fit(df)
    ````
    Again, I think we can all agree there are tradeoffs. Let's see if we can 
agree on something for now and go with it. If someone feels really strongly, 
then maybe we can discuss it in another JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to