Github user vlad17 commented on the issue:
https://github.com/apache/spark/pull/14547
@sethah Thanks for the FYI. I'm pretty confident that it'll help since now
we're directly optimizing the loss function. However, it would be nice to prove
this. Unfortunately, the example I linked above uses a skewed dataset.
The only estimator whose behavior changed is GBTClassifier (now the
bernoulli predictions use an NR step rather than guess the mean). And since the
raw prediction column is unavailable for the GBTClassifier, I can't really
compare the classifiers sensibly on skewed datasets since AUC is out of the
question.
I'm going to have to spend some time trying to find a "real" dataset that's
not skewed but large enough to be meaningful or just make an artificial one.
And also spark-perf will need to be re-run.
Also, regarding the binary incompatibility failure - part of that was my
fault, part of it was due to an incompatibility with a package-private method.
I added an exception for the binary incompatibility for the package-private
method - is that OK?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]