[jira] [Commented] (SPARK-18441) Add Smote in spark mlib and ml

2017-06-28 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067494#comment-16067494
 ] 

yuhao yang commented on SPARK-18441:


Move the Smote code to 
https://gist.github.com/hhbyyh/346467373014943a7f20df208caeb19b

> Add Smote in spark mlib and ml
> --
>
> Key: SPARK-18441
> URL: https://issues.apache.org/jira/browse/SPARK-18441
> Project: Spark
>  Issue Type: Wish
>  Components: ML, MLlib
>Affects Versions: 2.0.1
>Reporter: lichenglin
>
> PLZ Add Smote in spark mlib and ml in case of  the "not balance of train 
> data" for Classification



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18441) Add Smote in spark mlib and ml

2016-11-16 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670210#comment-15670210
 ] 

Nick Pentreath commented on SPARK-18441:


Yes, it would be good to understand what this is all about. Perhaps some 
techniques for dealing with class imbalance could live within Spark. However, 
take a look at https://github.com/scikit-learn-contrib/imbalanced-learn for 
example - there is a contrib module focused on class imbalance. Something 
similar would be very interesting as a package.

> Add Smote in spark mlib and ml
> --
>
> Key: SPARK-18441
> URL: https://issues.apache.org/jira/browse/SPARK-18441
> Project: Spark
>  Issue Type: Wish
>  Components: ML, MLlib
>Affects Versions: 2.0.1
>Reporter: lichenglin
>
> PLZ Add Smote in spark mlib and ml in case of  the "not balance of train 
> data" for Classification



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18441) Add Smote in spark mlib and ml

2016-11-16 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670086#comment-15670086
 ] 

Sean Owen commented on SPARK-18441:
---

Can this JIRA be expanded to explain what Smote is, with a reference, and why 
it should be in Spark? these should in general be external packages

> Add Smote in spark mlib and ml
> --
>
> Key: SPARK-18441
> URL: https://issues.apache.org/jira/browse/SPARK-18441
> Project: Spark
>  Issue Type: Wish
>  Components: ML, MLlib
>Affects Versions: 2.0.1
>Reporter: lichenglin
>
> PLZ Add Smote in spark mlib and ml in case of  the "not balance of train 
> data" for Classification



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18441) Add Smote in spark mlib and ml

2016-11-15 Thread lichenglin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669125#comment-15669125
 ] 

lichenglin commented on SPARK-18441:


Thanks ,It works now

> Add Smote in spark mlib and ml
> --
>
> Key: SPARK-18441
> URL: https://issues.apache.org/jira/browse/SPARK-18441
> Project: Spark
>  Issue Type: Wish
>  Components: ML, MLlib
>Affects Versions: 2.0.1
>Reporter: lichenglin
>
> PLZ Add Smote in spark mlib and ml in case of  the "not balance of train 
> data" for Classification



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18441) Add Smote in spark mlib and ml

2016-11-15 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669110#comment-15669110
 ] 

yuhao yang commented on SPARK-18441:


It should work with Spark 2.0+. If you have modified the code, please ensure 
it's still in the package org.apache.spark..., since BLAS is with modifier 
private[spark].

Feel free to contact me via email (yuhao.y...@intel.com).

> Add Smote in spark mlib and ml
> --
>
> Key: SPARK-18441
> URL: https://issues.apache.org/jira/browse/SPARK-18441
> Project: Spark
>  Issue Type: Wish
>  Components: ML, MLlib
>Affects Versions: 2.0.1
>Reporter: lichenglin
>
> PLZ Add Smote in spark mlib and ml in case of  the "not balance of train 
> data" for Classification



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18441) Add Smote in spark mlib and ml

2016-11-15 Thread lichenglin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669050#comment-15669050
 ] 

lichenglin commented on SPARK-18441:


Thanks for your reply.
May I ask what version of spark this Smote use.
I'm using spark2.0.1.
But some error occur because of the "org.apache.spark.ml.linalg.BLAS" 

my eclipse can't recognize this class.

> Add Smote in spark mlib and ml
> --
>
> Key: SPARK-18441
> URL: https://issues.apache.org/jira/browse/SPARK-18441
> Project: Spark
>  Issue Type: Wish
>  Components: ML, MLlib
>Affects Versions: 2.0.1
>Reporter: lichenglin
>
> PLZ Add Smote in spark mlib and ml in case of  the "not balance of train 
> data" for Classification



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18441) Add Smote in spark mlib and ml

2016-11-15 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666539#comment-15666539
 ] 

yuhao yang commented on SPARK-18441:


Hi [~licl], I had an implementation of Smote at 
https://github.com/hhbyyh/Test/blob/master/Smote/src/SmoteTest.scala . Hope it 
can help you. 

Right now Spark 2.1 is already in the QA phase (feature lock down), so the 
earliest possible window would be 2.2. Before that, we need to resolve some 
issues, like if SmoteSampler is a feature transformer, we need to find a way to 
disable it in the pipeline during the predict phase. Besides, guess we need to 
collect more opinions from the community to see if this is a common 
requirement. 

> Add Smote in spark mlib and ml
> --
>
> Key: SPARK-18441
> URL: https://issues.apache.org/jira/browse/SPARK-18441
> Project: Spark
>  Issue Type: Wish
>  Components: ML, MLlib
>Affects Versions: 2.0.1
>Reporter: lichenglin
>
> PLZ Add Smote in spark mlib and ml in case of  the "not balance of train 
> data" for Classification



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org