zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML]
Implement Tree-Based Feature Transformation for ML
URL: https://github.com/apache/spark/pull/25383#discussion_r313696881
##########
File path: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala
##########
@@ -455,7 +508,19 @@ private[ml] object GBTClassifierParams {
Array("logistic").map(_.toLowerCase(Locale.ROOT))
}
-private[ml] trait GBTClassifierParams extends GBTParams with
HasVarianceImpurity {
+private[ml] trait GBTClassifierParams extends GBTParams with
HasVarianceImpurity
+ with ProbabilisticClassifierParams {
+
+ override protected def validateAndTransformSchema(
Review comment:
@mgaido91 I tend to keep current way, that is because the superclasses are
different:
1,the `super.validateAndTransformSchema(schema, fitting, featuresDataType)`
in `GBTClassifierParams` & `RandomForestClassifierParams` are from
`ProbabilisticClassifierParams`, which check cols
probabilityCol,rawPredictionCol,featuresCol,labelCol,weightCol,predictionCol
2,while the super method called in `RandomForestRegressorParams` &
`GBTRegressorParams` are from `PredictorParams`, which only check cols
featuresCol,labelCol,weightCol,predictionCol
We can add another two trait for classification and regression,
respectively. Like `TreeEnsembleClassifierParams` &
`TreeEnsembleRegressorParams`.
However, I think this maybe not worthwhile, since there will be only two
subclasses for each, and this will make the hierarchy more complex.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]