Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17117#discussion_r104168600
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -152,6 +158,35 @@ class KMeansSuite extends SparkFunSuite with
MLlibTestSparkContext with DefaultR
val kmeans = new KMeans()
testEstimatorAndModelReadWrite(kmeans, dataset,
KMeansSuite.allParamSettings, checkModelData)
}
+
+ test("training with initial model") {
+ val kmeans = new KMeans().setK(2).setSeed(1)
+ val model1 = kmeans.fit(rData)
+ val model2 =
kmeans.setInitMode("initialModel").setInitialModel(model1).fit(rData)
+ model2.clusterCenters.zip(model1.clusterCenters)
+ .foreach { case (center2, center1) => assert(center2 ~== center1
absTol 1E-8) }
+ }
+
+ test("training with initial model, error cases") {
+ val kmeans = new KMeans().setK(k).setSeed(1).setMaxIter(1)
+
+ // Sets initMode with 'initialModel', but does not specify initial
model.
+ intercept[IllegalArgumentException] {
--- End diff --
I disagree the way in the other PR, the reason is:
In that PR, if users ```setInitialModel(model)```, it will call
```set(initMode, "initialModel")```. Take the following scenarios:
```
val kmeans = new KMeans().setInitialModel(initialModel) // Users want to
start with an initial model.
val model1 = kmeans.fit(dataset) // The model was fitted by warm start.
// Then they want to try another starting way, for example, starting with
"k-means||".
val model2 = kmeans.setInitMode("k-means||") // But in #11119 's code
route, it will still starts with initial model. Though we can change this by
modify the code in mllib.clustering.KMeans, but I think it's confused.
```
Another scenario is users set ```initialModel``` by mistake, but they still
want to start with ```random``` mode, they will confused what happened.
So I'm more prefer to let users set ```initMode``` to ```initialModel```
explicitly, and set ```initialModel``` to corresponding model. Otherwise, we
just throw exceptions to let users correct their setting. I'm OK to add a test
for the first case.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]