[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

srowen Tue, 13 Nov 2018 13:40:37 -0800

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22986#discussion_r233232112
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
    @@ -538,7 +538,7 @@ private[ml] object GBTClassifierParams {
         Array("logistic").map(_.toLowerCase(Locale.ROOT))
     }
     
    -private[ml] trait GBTClassifierParams extends GBTParams with 
TreeClassifierParams {
    +private[ml] trait GBTClassifierParams extends GBTParams with 
TreeRegressorParams {
    --- End diff --
    
    I looked more into this and see that setImpurity is deprecated. I presume 
the point is to use set(impurity, ...) instead. Yeah, that's no longer possible 
to override in subclasses, scratch that; overriding setImpurity would have been 
just fine IMHO but that's not going to last anyway.
    
    I do agree this is why there are different traits for classifiers and 
regressors, but I don't think that means a classifier should extend 
TreeRegressorParams because its parameters happen to match.
    
    One option is to let the definition of `impurity` itself be overridden. 
That seems OK. Or we could make a new 'VarianceClassifier' or something that 
defines this variance-only impurity parameter and let TreeRegressionParams and 
GBTClassifierParams extend it. It is a little wacky, but quite reasonable from 
an OOP perspective. I think.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

Reply via email to