[jira] [Commented] (SPARK-12875) Add Weight of Evidence and Information value to Spark.ml as a feature transformer
[ https://issues.apache.org/jira/browse/SPARK-12875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098946#comment-16098946 ] yuhao yang commented on SPARK-12875: Close stale jira. > Add Weight of Evidence and Information value to Spark.ml as a feature > transformer > - > > Key: SPARK-12875 > URL: https://issues.apache.org/jira/browse/SPARK-12875 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: yuhao yang >Priority: Minor > > As a feature transformer, WOE and IV enable one to: > Consider each variable’s independent contribution to the outcome. > Detect linear and non-linear relationships. > Rank variables in terms of "univariate" predictive strength. > Visualize the correlations between the predictive variables and the binary > outcome. > http://multithreaded.stitchfix.com/blog/2015/08/13/weight-of-evidence/ gives > a good introduction to WoE and IV. > The Weight of Evidence or WoE value provides a measure of how well a > grouping of feature is able to distinguish between a binary response (e.g. > "good" versus "bad"), which is widely used in grouping continuous feature or > mapping categorical features to continuous values. It is computed from the > basic odds ratio: > (Distribution of positive Outcomes) / (Distribution of negative Outcomes) > where Distr refers to the proportion of positive or negative in the > respective group, relative to the column totals. > The WoE recoding of features is particularly well suited for subsequent > modeling using Logistic Regression or MLP. > In addition, the information value or IV can be computed based on WoE, which > is a popular technique to select variables in a predictive model. > TODO: Currently we support only calculation for categorical features. Add an > estimator to estimate the proper grouping for continuous feature. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12875) Add Weight of Evidence and Information value to Spark.ml as a feature transformer
[ https://issues.apache.org/jira/browse/SPARK-12875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105000#comment-15105000 ] Apache Spark commented on SPARK-12875: -- User 'hhbyyh' has created a pull request for this issue: https://github.com/apache/spark/pull/10803 > Add Weight of Evidence and Information value to Spark.ml as a feature > transformer > - > > Key: SPARK-12875 > URL: https://issues.apache.org/jira/browse/SPARK-12875 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: yuhao yang >Priority: Minor > > As a feature transformer, WOE and IV enable one to: > Consider each variable’s independent contribution to the outcome. > Detect linear and non-linear relationships. > Rank variables in terms of "univariate" predictive strength. > Visualize the correlations between the predictive variables and the binary > outcome. > http://multithreaded.stitchfix.com/blog/2015/08/13/weight-of-evidence/ gives > a good introduction to WoE and IV. > The Weight of Evidence or WoE value provides a measure of how well a > grouping of feature is able to distinguish between a binary response (e.g. > "good" versus "bad"), which is widely used in grouping continuous feature or > mapping categorical features to continuous values. It is computed from the > basic odds ratio: > (Distribution of positive Outcomes) / (Distribution of negative Outcomes) > where Distr refers to the proportion of positive or negative in the > respective group, relative to the column totals. > The WoE recoding of features is particularly well suited for subsequent > modeling using Logistic Regression or MLP. > In addition, the information value or IV can be computed based on WoE, which > is a popular technique to select variables in a predictive model. > TODO: Currently we support only calculation for categorical features. Add an > estimator to estimate the proper grouping for continuous feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org