[jira] [Commented] (SPARK-13223) Add stratified sampling to ML feature engineering
[ https://issues.apache.org/jira/browse/SPARK-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098933#comment-16098933 ] yuhao yang commented on SPARK-13223: Close it since it's been overlooked for some time and can be implemented with #17583 easily. > Add stratified sampling to ML feature engineering > - > > Key: SPARK-13223 > URL: https://issues.apache.org/jira/browse/SPARK-13223 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: yuhao yang >Priority: Minor > > I found it useful to add an sampling transformer during a case of fraud > detection. It can be used in resampling or overSampling, which in turn is > required by ensemble and unbalanced data processing. > Internally, it invoke the sampleByKey in Pair RDD operation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13223) Add stratified sampling to ML feature engineering
[ https://issues.apache.org/jira/browse/SPARK-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135689#comment-15135689 ] Apache Spark commented on SPARK-13223: -- User 'hhbyyh' has created a pull request for this issue: https://github.com/apache/spark/pull/11102 > Add stratified sampling to ML feature engineering > - > > Key: SPARK-13223 > URL: https://issues.apache.org/jira/browse/SPARK-13223 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: yuhao yang >Priority: Minor > > I found it useful to add an sampling transformer during a case of fraud > detection. It can be used in resampling or overSampling, which in turn is > required by ensemble and unbalanced data processing. > Internally, it invoke the sampleByKey in Pair RDD operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org