Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/17151#discussion_r104274732
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
---
@@ -372,16 +372,18 @@ class DecisionTreeClassifierSuite
// Categorical splits with tree depth 2
val categoricalData: DataFrame =
TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
- testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings,
checkModelData)
+ testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings,
--- End diff --
@sethah Thanks for your kindly remind. I was planing to write as your
suggestion before I'm thinking:
* Actually all ```Model``` only need to extends a fraction of ```Params```
from ```Estimator```, so all ```testEstimatorAndModelReadWrite(estimator,
dataset, testParams, checkModelData)``` should be changed to
```testEstimatorAndModelReadWrite(estimator, dataset, testEstimatorParams,
testModelParams, checkModelData)``` eventually. I explicitly write with the
later way in test suites to remind developers should separate their estimator
and model params when adding new algorithms' read/write test. I'm afraid that
developers are not aware of the separation if they refer other test suites and
find almost all test cases only pass in ```testParams```.
* Though in the currently change, we pass in ```allParamSettings``` to
both ```testEstimatorParams``` and ```testModelParams```, this is because they
share the same params set. Others like ```ALS``` will be pass in separate
params. I think we should push forward to refactor ```***``` and ```***Model```
to separate their params, which could make models more succinct.
* If this is a public API, I totally agree with you. However, this is an
internal auxiliary function, I think all test cases will need to pass in
separate params eventually, so I settle a matter at one go.
This is my two cents, I'm still open to hear your thoughts. If you have
strongly opinion, I can update according your suggestion. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]