Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/14547
  
    TBH, I'm not certain after having read many of those papers exactly what 
constitutes "TreeBoost". From the following excerpt, it seems to me like 
TreeBoost is simply defined by making terminal node updates to minimize 
boosting loss, and *not* by minimizing the loss when splitting the tree nodes.
    
    ````
    The terminal node updates are based on medians. An alternative approach 
would be to build a tree directly to minimize the loss criterion.
    ````
    
    That being said, I'm not certain about it and I don't think there's a much 
better way to implement this than coupling the loss and impurity, since we need 
to collect certain sufficient statistics to make terminal node updates anyway. 
Thanks for your notes and clarification!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to