Github user sethah commented on the issue:
    So, taking a look at the current patch, the API for this "loss-based" 
impurity feels clunky and a bit confusing. To enumerate, we have the following 
    **1. Completely decoupled loss and impurity** - Spark 2.0 and before
    **2. Terminal node updates** - trees trained with _some_ impurity, but 
predictions are optimal (this patch), aka "TreeBoost"
    **3. Train trees with loss directly**-  this incorporates both terminal 
node refinements and minimizing the loss directly during tree training. This 
could be done in the future and is how XGBoost works.
    I don't exactly see why we need to maintain 1 at all. Terminal node 
refinements should be a strict improvement over 1. By attempting to maintain 
both options, we expose an "impurity" parameter for GBTs, which IMO does not 
need to be public. To me, it is confusing also. e.g.
    scala> gbtc.getImpurity
    res1: String = loss-based
    What exactly does that mean to the user? I think that, ideally in the 
future, we would support 2 and 3 and users could select between the two. I can 
envision there being trade-offs between 2 and 3, but in theory 2 should be a 
strict superset of 1. 
    AFAIK, scikit does not expose the option to perform terminal-node updates 
to the user and I am not even sure there is a paper documenting gradient tree 
boosting without explicitly performing the terminal node updates. For instance, 
we have previously referred to this paper (please correct me if I'm 
wrong here) in the code as justification for describing the current approach as 
"Stochastic Gradient Boosting." But in the paper both sections 1 and 2 use 
terminal node refinements.
    Silently changing the algorithm could be tricky, but if we can be certain 
that this change is a strict improvement then I'm not sure it's a problem. I'm 
curious to hear others' thoughts on this.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to