Github user sethah commented on the issue:
https://github.com/apache/spark/pull/14547
TBH, I'm not certain after having read many of those papers exactly what
constitutes "TreeBoost". From the following excerpt, it seems to me like
TreeBoost is simply defined by making terminal node updates to minimize
boosting loss, and *not* by minimizing the loss when splitting the tree nodes.
````
The terminal node updates are based on medians. An alternative approach
would be to build a tree directly to minimize the loss criterion.
````
That being said, I'm not certain about it and I don't think there's a much
better way to implement this than coupling the loss and impurity, since we need
to collect certain sufficient statistics to make terminal node updates anyway.
Thanks for your notes and clarification!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]