Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/11119#discussion_r78669003
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -303,6 +322,29 @@ class KMeans @Since("1.5.0") (
@Since("1.5.0")
def setSeed(value: Long): this.type = set(seed, value)
+ /** @group setParam */
+ @Since("2.1.0")
+ def setInitialModel(value: KMeansModel): this.type = set(initialModel,
value)
+
+ /** @group setParam */
+ @Since("2.1.0")
+ def setInitialModel(value: Model[_]): this.type = {
--- End diff --
Is the reason we provide this method just so we can throw a better error
message? I am concerned about providing _three different_ setter methods for
this param, particularly, are we going to have to do this every time? There are
ways to provide smarter error messages and more specific param docs, which may
be better than adding extra setters.
I see why we need to provide a way to set the initial model with just
cluster centers, but I think we should limit the "convenience" methods we add,
since we plan to extend this design to many other models. For instance, in
logistic regression do we add:
````scala
def setInitialModel(coefficients: Vector, intercept: Double) = ...
def setInitialModel(coefficients: Vector) = setInitialModel(coefficients,
0.0)
def setInitialModel(coefficients: Matrix, intercept: Vector) = ...
def setInitialModel(coefficients: Matrix) = ...
````
I think we can get away with only specifying one setter method, as we do in
the other params. To allow users to specify the model from centers we could add
a method like `KMeansModel.fromCenters(centers)` and users can use that. I
appreciate others' thoughts on this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]