Github user vlad17 commented on the issue:
https://github.com/apache/spark/pull/14547
@sethah You raise good points.
Regarding (1), I don't know if it is actually true. I don't want to speak
for @jkbradley, but I was just going off of "software engineering intuition"
about backwards capability of the algorithm's behavior. But let's consider an
analogous example - if LogisticRegression was using regular batch GD, and we
moved it to L-BFGS, it wouldn't make much sense to offer a new option for "gd".
I think the question is whether reverting to original behavior is common
enough to merit a larger, more clunky, and more confusing API. And as the
notion of "original" will be changing over time, I'm starting to see the
attractiveness of @sethah's original proposition to get rid of this option
entirely, and let us do whatever we want under the hood impurity-wise.
**TL; DR:** I can see at no point a data scientist saying "you know what
will help my l1 error? A mean predictor!"
The strongest point in favor of this that comes to me is the following:
people who would be changing the impurity metric are going to be people who are
working on a GBT model tuning; but there's no good reason to use variance-based
impurity with mean predictions for a loss that isn't optimized by those
changes! Any model tuning which would, in some way or another, be checking
`.setImpurity("variance")` vs `.setImpurity("loss-based")` that happens to show
that you do better when choosing variance with CV, then all you've done is grid
search on GBT model parameters to overfit to noise in your data.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]