[GitHub] spark issue #14547: [SPARK-16718][MLlib] gbm-style treeboost

vlad17 Sat, 15 Oct 2016 11:10:56 -0700

Github user vlad17 commented on the issue:

    https://github.com/apache/spark/pull/14547
  
    @sethah You raise good points.
    
    Regarding (1), I don't know if it is actually true. I don't want to speak 
for @jkbradley, but I was just going off of "software engineering intuition" 
about backwards capability of the algorithm's behavior. But let's consider an 
analogous example - if LogisticRegression was using regular batch GD, and we 
moved it to L-BFGS, it wouldn't make much sense to offer a new option for "gd".
    
    I think the question is whether reverting to original behavior is common 
enough to merit a larger, more clunky, and more confusing API. And as the 
notion of "original" will be changing over time, I'm starting to see the 
attractiveness of @sethah's original proposition to get rid of this option 
entirely, and let us do whatever we want under the hood impurity-wise.
    
    **TL; DR:** I can see at no point a data scientist saying "you know what 
will help my l1 error? A mean predictor!"
    
    The strongest point in favor of this that comes to me is the following: 
people who would be changing the impurity metric are going to be people who are 
working on a GBT model tuning; but there's no good reason to use variance-based 
impurity with mean predictions for a loss that isn't optimized by those 
changes! Any model tuning which would, in some way or another, be checking 
`.setImpurity("variance")` vs `.setImpurity("loss-based")` that happens to show 
that you do better when choosing variance with CV, then all you've done is grid 
search on GBT model parameters to overfit to noise in your data.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #14547: [SPARK-16718][MLlib] gbm-style treeboost

Reply via email to