[GitHub] spark issue #14449: [SPARK-16843][MLLIB] add the percentage ChiSquareSelecto...

2016-08-02 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14449 Hi @srowen, thanks for your comment. I agree for your comment, user can get the number of features without percentage method. For the user experience, sometimes the percentage method seems

[GitHub] spark pull request #14449: [SPARK-16843][MLLIB] add the percentage ChiSquare...

2016-08-01 Thread mpjlu
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/14449 [SPARK-16843][MLLIB] add the percentage ChiSquareSelector feature ## What changes were proposed in this pull request? add the percentage ChiSquareSelector feature ## How

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...

2016-08-11 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r74404348 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...

2016-08-11 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r74412725 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...

2016-08-11 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r74400235 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...

2016-08-11 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r74416260 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (

[GitHub] spark issue #14449: [SPARK-16843][MLLIB] add the percentage ChiSquareSelecto...

2016-08-04 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14449 Hi @srowen , I also plan to submit some PR about feature selection methods based on univariate statistical test, like the methods in scikit-learn: SelectFpr (using false positive rate), SelectFdr

[GitHub] spark issue #14597: [SPARK-17017][MLLIB] add a chiSquare Selector based on F...

2016-08-14 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Hi @avulanov . In general, FPR feature selection should not modify the code of existing ChiSqSelector, as we have implemented in this PR. But if we need to reuse the ChiSqTestResult

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...

2016-08-14 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r74721128 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...

2016-08-15 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r74740698 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...

2016-08-15 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r74742525 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...

2016-08-15 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r74743026 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (

[GitHub] spark pull request #14597: Fpr chi square

2016-08-11 Thread mpjlu
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/14597 Fpr chi square ## What changes were proposed in this pull request? Univariate feature selection works by selecting the best features based on univariate statistical tests. False Positive

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...

2016-08-11 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r74396249 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...

2016-08-11 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r74397115 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (

[GitHub] spark issue #14597: [SPARK-17017][MLLIB] add a chiSquare Selector based on F...

2016-08-11 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Hi, @srowen , I can modify the implementation in .ml to accommodate the new params. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...

2016-08-16 Thread mpjlu
Github user mpjlu closed the pull request at: https://github.com/apache/spark/pull/14597 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-18 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75314112 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +228,35 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-21 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75610472 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +228,35 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-21 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75613576 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +228,35 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-22 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75634659 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -54,6 +54,29 @@ private[feature] trait ChiSqSelectorParams extends

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-22 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75636510 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +227,20 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-18 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75419052 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +228,35 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-22 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75661562 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +227,20 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in T...

2016-08-24 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14785#discussion_r76035442 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtils.scala --- @@ -154,7 +154,11 @@ object TestingUtils { */ def absTol

[GitHub] spark pull request #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in T...

2016-08-24 Thread mpjlu
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/14785 [SPARK-17207][MLLIB]fix comparing Vector bug in TestingUtils ## What changes were proposed in this pull request? fix comparing Vector bug in TestingUtils. There is the same bug

[GitHub] spark pull request #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in T...

2016-08-24 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14785#discussion_r76037277 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtils.scala --- @@ -154,7 +154,11 @@ object TestingUtils { */ def absTol

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-24 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r76041373 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +232,21 @@ class ChiSqSelector @Since("

[GitHub] spark issue #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in TestingU...

2016-08-24 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14785 Sure, I will fix it, and add test cases. thanks. @dbtsai , --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #14785: [SPARK-17207][MLLIB]fix comparing Vector bug in TestingU...

2016-08-25 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14785 Hi @dbtsai , PR 2294 added Matrix comparing in TestingUtils, but did not add any test cases in TestingUtilsSuite. I did not add test cases for Matrix comparing in the PR either. If Matrix

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-09-05 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r77536408 --- Diff: python/pyspark/mllib/feature.py --- @@ -305,7 +350,12 @@ def fit(self, data): treated as categorical for each distinct

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-09-05 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r77532997 --- Diff: python/pyspark/mllib/feature.py --- @@ -271,29 +271,74 @@ def transform(self, vector): """

[GitHub] spark issue #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector based ...

2016-09-06 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Hi @yanboliang , could you please kindly review the python code of this PR. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-09-01 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r77137991 --- Diff: python/pyspark/mllib/feature.py --- @@ -276,24 +276,64 @@ class ChiSqSelector(object): """ Creates a ChiSquared f

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-09-05 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r77473261 --- Diff: python/pyspark/mllib/feature.py --- @@ -276,24 +276,64 @@ class ChiSqSelector(object): """ Creates a ChiSquared f

[GitHub] spark pull request #15058: [SPARK-17505][MLLIB]Add setBins for BinaryClassif...

2016-09-12 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15058#discussion_r78355383 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -43,10 +43,13 @@ import

[GitHub] spark pull request #15058: [SPARK-17505][MLLIB]Add setBins for BinaryClassif...

2016-09-12 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15058#discussion_r78355993 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -43,10 +43,13 @@ import

[GitHub] spark pull request #15058: [SPARK-17505][MLLIB]Add setBins for BinaryClassif...

2016-09-12 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15058#discussion_r78358229 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -43,10 +43,13 @@ import

[GitHub] spark pull request #15058: [SPARK-17505][MLLIB]Add setBins for BinaryClassif...

2016-09-12 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15058#discussion_r78362121 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -43,10 +43,13 @@ import

[GitHub] spark pull request #15058: [SPARK-17505][MLLIB]Add setBins for BinaryClassif...

2016-09-12 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15058#discussion_r78356355 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -43,10 +43,13 @@ import

[GitHub] spark issue #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector based ...

2016-09-12 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 hi @srowen , thanks. This is my first PR. learn much from you. thanks very much. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #15058: [MLLIB]Add setBins for BinaryClassificationMetric...

2016-09-12 Thread mpjlu
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/15058 [MLLIB]Add setBins for BinaryClassificationMetrics ## What changes were proposed in this pull request? Add a setBins method for BinaryClassificationMetrics. BinaryClassificationMetrics

[GitHub] spark pull request #15058: [SPARK-17505][MLLIB]Add setBins for BinaryClassif...

2016-09-12 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15058#discussion_r78360606 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -43,10 +43,13 @@ import

[GitHub] spark issue #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector based ...

2016-09-13 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 hi @srowen , we have moved isSorted in ChiSqSelectorModel. There is error message "*** method isSorted(Array[Int])Boolean in class org.apache.spark.mllib.feature.ChiSqSelectorModel does not

[GitHub] spark issue #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector based ...

2016-09-13 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Hi @srowen , Python style fail is updated. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector based ...

2016-09-13 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 hi @srowen , I have updated the code for some MiMa test error. Could you please review it again. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSelector a...

2016-09-24 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/15214 Hi @yanboliang , got it. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15277: [SPARK-17704][ML][MLlib] ChiSqSelector performanc...

2016-09-28 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15277#discussion_r80961052 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -57,22 +69,21 @@ class ChiSqSelectorModel @Since("

[GitHub] spark issue #15299: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-29 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/15299 hi @srowen , is @transient needed for val selectedFeatures or val filterIndices, one of them? is it good to define filterIndices lazy? --- If your project is set up for it, you can reply

[GitHub] spark pull request #15058: [SPARK-17505][MLLIB]Add setBins for BinaryClassif...

2016-09-15 Thread mpjlu
Github user mpjlu closed the pull request at: https://github.com/apache/spark/pull/15058 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #15058: [SPARK-17505][MLLIB]Add setBins for BinaryClassification...

2016-09-15 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/15058 sure, I close it now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector based ...

2016-09-15 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Thanks very much, I am in holiday now, will update the code this Sunday. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-09-19 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r79405860 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -54,11 +55,44 @@ private[feature] trait ChiSqSelectorParams extends

[GitHub] spark pull request #15212: [MLLIB][ML]add feature selector method based on: ...

2016-09-23 Thread mpjlu
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/15212 [MLLIB][ML]add feature selector method based on: False Discovery Rate (FDR) and Family wise error rate (FWE) ## What changes were proposed in this pull request? Univariate feature selection

[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...

2016-09-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15214#discussion_r80277277 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -143,13 +149,13 @@ final class ChiSqSelector @Since("1.6.0"

[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...

2016-09-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15214#discussion_r80278819 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala --- @@ -76,7 +76,7 @@ class ChiSqSelectorSuite extends SparkFunSuite

[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...

2016-09-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15214#discussion_r80277785 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -160,6 +166,12 @@ final class ChiSqSelector @Since("1.6.0"

[GitHub] spark issue #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSelector a...

2016-09-23 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/15214 Hi @srowen and @yanboliang ; Thanks for your following up PR. I partly agree with your comments on 17017. **1. "if users both set numTopFeatures and percentile, it will train

[GitHub] spark pull request #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSel...

2016-09-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15214#discussion_r80277470 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -143,13 +149,13 @@ final class ChiSqSelector @Since("1.6.0"

[GitHub] spark issue #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector based ...

2016-09-20 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 No problem. thanks very much @srowen . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14597: [SPARK-17017][MLLIB] add a chiSquare Selector based on F...

2016-08-17 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Hi @srowen, I have added the parameter to control the feature selection type. The usage is like this: **var selector = new ChiSqSelector() var model = selector.fit(df) // by default

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...

2016-08-17 Thread mpjlu
GitHub user mpjlu reopened a pull request: https://github.com/apache/spark/pull/14597 [SPARK-17017][MLLIB] add a chiSquare Selector based on False Positive Rate (FPR) test ## What changes were proposed in this pull request? Univariate feature selection works by selecting

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-18 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75286500 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -171,14 +177,47 @@ object ChiSqSelectorModel extends Loader

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-18 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75294150 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +228,35 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-18 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75286355 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +228,35 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-18 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75286399 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +228,35 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-18 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75289389 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +228,35 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-18 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75286558 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -54,6 +54,29 @@ private[feature] trait ChiSqSelectorParams extends

[GitHub] spark issue #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector based ...

2016-08-18 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Hi @srowen , I will update the Python API to match this changes. Now, the current Python API is not conflict with the changes. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #14824: [ML][MLLIB]The require condition and message does...

2016-08-26 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14824#discussion_r76421555 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -454,10 +454,15 @@ class SparseMatrix @Since("

[GitHub] spark pull request #14824: [ML][MLLIB]The require condition and message does...

2016-08-26 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14824#discussion_r76417535 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -455,9 +455,11 @@ class SparseMatrix @Since("2.0.0") (

[GitHub] spark pull request #14824: [ML][MLLIB]The require condition and message does...

2016-08-26 Thread mpjlu
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/14824 [ML][MLLIB]The require condition and message doesn't match in SparseMatrix. ## What changes were proposed in this pull request? The require condition and message doesn't match

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-22 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75802754 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -171,14 +180,47 @@ object ChiSqSelectorModel extends Loader

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-22 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75802771 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -171,14 +180,47 @@ object ChiSqSelectorModel extends Loader

[GitHub] spark issue #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector based ...

2016-08-29 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Hi @srowen , I have added Python API and test cases for ChiSqSelector. Could you kindly review it again. Thanks. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-29 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r76624379 --- Diff: python/pyspark/mllib/feature.py --- @@ -276,24 +276,64 @@ class ChiSqSelector(object): """ Creates a ChiSquared f

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75851527 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -171,14 +180,48 @@ object ChiSqSelectorModel extends Loader

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75851763 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +232,21 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75852138 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -91,8 +137,17 @@ final class ChiSqSelector @Since("1.6.0"

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75851296 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -69,21 +73,26 @@ class ChiSqSelectorModel @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75856793 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -91,8 +137,17 @@ final class ChiSqSelector @Since("1.6.0"

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-23 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r75858450 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -91,8 +137,17 @@ final class ChiSqSelector @Since("1.6.0"

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-24 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r76065118 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +232,21 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-24 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r76068163 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +232,21 @@ class ChiSqSelector @Since("

[GitHub] spark issue #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector based ...

2016-08-24 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Sure, I can update the Python API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-24 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r76059098 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +232,21 @@ class ChiSqSelector @Since("

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-24 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/14597#discussion_r76059026 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -189,11 +232,21 @@ class ChiSqSelector @Since("

[GitHub] spark issue #15212: [SPARK-17645][MLLIB][ML][WIP]add feature selector method...

2016-09-27 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/15212 hi @srowen @yanboliang , I have updated this PR. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15277: [SPARK-17704][ML][MLlib] ChiSqSelector performance impro...

2016-09-28 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/15277 sort in the transform will cause sort too many times. so this looks good to me. thangks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSelector a...

2016-09-25 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/15214 hi @srowen . My understand of yanbo's comments here is, if user use chSqSelector like this: model1 = new ChiSqSelector().setFPR(0.05).setKBest(100).fit(data) model2 = new ChiSqSelector

[GitHub] spark issue #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSelector a...

2016-09-25 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/15214 Thanks, this looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15214: [SPARK-17017][Follow-up][ML] Refactor of ChiSqSelector a...

2016-09-25 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/15214 Hi @srowen , sorry for forgetting update the doc and python/ml/feature.py in last PR. This pr has added ml/feature.py. It looks good to me. Thanks --- If your project is set up

[GitHub] spark pull request #15647: [SPARK-18088][ML] Various ChiSqSelector cleanups

2016-10-27 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15647#discussion_r85311619 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -44,67 +44,78 @@ private[feature] trait ChiSqSelectorParams extends

[GitHub] spark pull request #15647: [SPARK-18088][ML] Various ChiSqSelector cleanups

2016-10-27 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15647#discussion_r85311677 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -44,67 +44,78 @@ private[feature] trait ChiSqSelectorParams extends

[GitHub] spark pull request #15647: [SPARK-18088][ML] Various ChiSqSelector cleanups

2016-10-27 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15647#discussion_r85310898 --- Diff: docs/mllib-feature-extraction.md --- @@ -227,22 +227,19 @@ both speed and statistical learning behavior. [`ChiSqSelector`](api/scala/index.html

[GitHub] spark pull request #15647: [SPARK-18088][ML] Various ChiSqSelector cleanups

2016-10-27 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15647#discussion_r85310797 --- Diff: docs/ml-features.md --- @@ -1333,14 +1333,14 @@ for more details on the API. `ChiSqSelector` stands for Chi-Squared feature selection

[GitHub] spark pull request #15647: [SPARK-18088][ML] Various ChiSqSelector cleanups

2016-10-27 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15647#discussion_r85310862 --- Diff: docs/mllib-feature-extraction.md --- @@ -227,22 +227,19 @@ both speed and statistical learning behavior. [`ChiSqSelector`](api/scala/index.html

[GitHub] spark pull request #15647: [SPARK-18088][ML] Various ChiSqSelector cleanups

2016-10-27 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15647#discussion_r85311930 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -171,18 +171,19 @@ object ChiSqSelectorModel extends Loader

[GitHub] spark pull request #15647: [SPARK-18088][ML] Various ChiSqSelector cleanups

2016-10-27 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15647#discussion_r85310501 --- Diff: docs/ml-features.md --- @@ -1333,14 +1333,14 @@ for more details on the API. `ChiSqSelector` stands for Chi-Squared feature selection

[GitHub] spark pull request #15212: [SPARK-17645][MLLIB][ML][WIP]add feature selector...

2016-10-19 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/15212#discussion_r84049805 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala --- @@ -72,11 +72,15 @@ private[feature] trait ChiSqSelectorParams extends

  1   2   3   >