Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/14547
@sethah AFAIK, the original gradient boosting algorithm was generic, not
specific to trees. That's Algorithm 1 from
[https://statweb.stanford.edu/~jhf/ftp/trebst.pdf] and is what MLlib has
currently.
I agree with your intuition about options 3 > 2 > 1 and encouraging users
to use option 3 via our API. I'd be OK with disallowing option 1. As a
software engineer, I'd want to allow 1 for backwards API compatibility, where
behavior and algorithms are part of the API. But as an ML person, I'd be Ok
with not even allowing 1 in the future to prevent users from doing the wrong
thing. Combining these, I'd recommend:
* For now, we make 2 the default behavior but still allow 1. (as in this
PR)
* In the future, we make 3 the default behavior, maybe allow 2, and do not
allow 1.
> "loss-based" What exactly does that mean to the user?
If this is unclear, then let's make the documentation for that Param
clearer and/or use a more intuitive name such as "auto."
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]