Github user sethah commented on the issue:
    A few observations:
    * Before this patch, users could not set an impurity (in fact, if you call 
`getImpurity` on a gbt classifier it returns "gini", which is not true. Seems 
an unrelated bug)
    * After this patch, users can technically set an impurity, but there are 
really only two options - "loss-based" (which is ambiguous to me) and 
"variance." Setting "variance" for a classifier could be confusing without an 
understanding of gbt internals
    * Scikit GBT and R's gbm do not expose an impurity API
    * After this patch, the impurity defaults to "loss-based", when it in fact 
may not be loss based at all. For the case of logistic loss in classification, 
we use a variance impurity, but indicate to the user that we are using a 
loss-based impurity. This is part of why I think it's confusing/unclear. I 
realize we _intend_ to have a fully loss-based solution in the future, but we 
don't have it right now. Seems quite misleading to say the impurity is 
"loss-based" when it truly is not.
    If we feel that we _must_ provide users the option to use the terminal node 
refinements or not use them (it seems that is the consensus) then exposing the 
impurity as a set-able param is one way. But impurity is really a binary choice 
right now - use terminal node refinements or don't (I'm omitting the special 
case of variance). We could alternatively expose an `expertParam` which could 
support "treeBoost", "gradientBoost" for now, and potentially "xgboost" in the 
    You can argue that being confusing isn't all that much a detriment since 
probably only users that mess with this will be those that understand well 
enough, which may be true. I guess I want to make sure that  1.) we feel we 
have to expose this as an option and 2.) what is the best way to do it given 
that 1.) is true.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to