GitHub user hhbyyh opened a pull request:

    https://github.com/apache/spark/pull/10803

    [SPARK-12875] [ML] Add Weight of Evidence and Information value to Spark.ml 
as a feature transformer

    jira: https://issues.apache.org/jira/browse/SPARK-12875
    As a feature transformer, WOE and IV enable one to:
    
    Consider each variable’s independent contribution to the outcome.
    Detect linear and non-linear relationships.
    Rank variables in terms of "univariate" predictive strength.
    Visualize the correlations between the predictive variables and the binary 
outcome.
    
    http://multithreaded.stitchfix.com/blog/2015/08/13/weight-of-evidence/ 
gives a good introduction to WoE and IV.
    
     The Weight of Evidence or WoE value provides a measure of how well a 
grouping of feature is able to distinguish between a binary response (e.g. 
"good" versus "bad"), which is widely used in grouping continuous feature or 
mapping categorical features to continuous values. It is computed from the 
basic odds ratio:
    (Distribution of positive Outcomes) / (Distribution of negative Outcomes)
    where Distr refers to the proportion of positive or negative in the 
respective group, relative to the column totals.
    
    The WoE recoding of features is particularly well suited for subsequent 
modeling using Logistic Regression or MLP.
    
    In addition, the information value or IV can be computed based on WoE, 
which is a popular technique to select variables in a predictive model.
    
    Next: Currently we support only calculation for categorical features. Add 
an estimator to estimate the proper grouping for continuous feature. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hhbyyh/spark woe

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10803.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10803
    
----
commit 0b360c4f54ee23efd5c29785e77d75217b5a0893
Author: Yuhao Yang <[email protected]>
Date:   2016-01-14T09:43:52Z

    draft for woe

commit a674bb0190a07c9af1f210ae7acba89d1188be57
Author: Yuhao Yang <[email protected]>
Date:   2016-01-14T15:49:05Z

    add iv

commit c2beb8b51a9a80f94da9de59f56988647050addf
Author: Yuhao Yang <[email protected]>
Date:   2016-01-16T08:36:05Z

    Merge remote-tracking branch 'upstream/master' into woe

commit c6239383914a4c8bde2c4afb22398399803e55b0
Author: Yuhao Yang <[email protected]>
Date:   2016-01-17T06:38:51Z

    woe and ut

commit ab3a961311672d70360fd4a322c42c92945b6ca6
Author: Yuhao Yang <[email protected]>
Date:   2016-01-17T06:38:58Z

    Merge remote-tracking branch 'upstream/master' into woe

commit 11f3f5a12659b0b5028f37e1542d33130ba1459e
Author: Yuhao Yang <[email protected]>
Date:   2016-01-17T16:27:31Z

    add require

commit f1f118b73950415e7326e744b1b17112942976fb
Author: Yuhao Yang <[email protected]>
Date:   2016-01-18T07:02:03Z

    Merge remote-tracking branch 'upstream/master' into woe

commit 8bb38abe79e03490e79cfe31b86607d93818cb27
Author: Yuhao Yang <[email protected]>
Date:   2016-01-18T09:18:27Z

    style fix

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to