Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14449
Hi @srowen, thanks for your comment.
I agree for your comment, user can get the number of features without
percentage method. For the user experience, sometimes the percentage method
seems
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/14449
[SPARK-16843][MLLIB] add the percentage ChiSquareSelector feature
## What changes were proposed in this pull request?
add the percentage ChiSquareSelector feature
## How
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r74404348
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r74412725
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r74400235
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r74416260
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14449
Hi @srowen , I also plan to submit some PR about feature selection methods
based on univariate statistical test, like the methods in scikit-learn:
SelectFpr (using false positive rate), SelectFdr
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
Hi @avulanov . In general, FPR feature selection should not modify the
code of existing ChiSqSelector, as we have implemented in this PR. But if we
need to reuse the ChiSqTestResult
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r74721128
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r74740698
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r74742525
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r74743026
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/14597
Fpr chi square
## What changes were proposed in this pull request?
Univariate feature selection works by selecting the best features based on
univariate statistical tests. False Positive
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r74396249
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r74397115
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
Hi, @srowen , I can modify the implementation in .ml to accommodate the new
params. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user mpjlu closed the pull request at:
https://github.com/apache/spark/pull/14597
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75314112
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +228,35 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75610472
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +228,35 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75613576
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +228,35 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75634659
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -54,6 +54,29 @@ private[feature] trait ChiSqSelectorParams extends
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75636510
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +227,20 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75419052
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +228,35 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75661562
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +227,20 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14785#discussion_r76035442
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtils.scala ---
@@ -154,7 +154,11 @@ object TestingUtils {
*/
def absTol
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/14785
[SPARK-17207][MLLIB]fix comparing Vector bug in TestingUtils
## What changes were proposed in this pull request?
fix comparing Vector bug in TestingUtils.
There is the same bug
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14785#discussion_r76037277
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtils.scala ---
@@ -154,7 +154,11 @@ object TestingUtils {
*/
def absTol
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r76041373
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +232,21 @@ class ChiSqSelector @Since("
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14785
Sure, I will fix it, and add test cases. thanks. @dbtsai ,
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14785
Hi @dbtsai , PR 2294 added Matrix comparing in TestingUtils, but did not
add any test cases in TestingUtilsSuite. I did not add test cases for Matrix
comparing in the PR either.
If Matrix
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r77536408
--- Diff: python/pyspark/mllib/feature.py ---
@@ -305,7 +350,12 @@ def fit(self, data):
treated as categorical for each distinct
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r77532997
--- Diff: python/pyspark/mllib/feature.py ---
@@ -271,29 +271,74 @@ def transform(self, vector):
"""
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
Hi @yanboliang , could you please kindly review the python code of this PR.
Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r77137991
--- Diff: python/pyspark/mllib/feature.py ---
@@ -276,24 +276,64 @@ class ChiSqSelector(object):
"""
Creates a ChiSquared f
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r77473261
--- Diff: python/pyspark/mllib/feature.py ---
@@ -276,24 +276,64 @@ class ChiSqSelector(object):
"""
Creates a ChiSquared f
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15058#discussion_r78355383
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
---
@@ -43,10 +43,13 @@ import
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15058#discussion_r78355993
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
---
@@ -43,10 +43,13 @@ import
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15058#discussion_r78358229
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
---
@@ -43,10 +43,13 @@ import
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15058#discussion_r78362121
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
---
@@ -43,10 +43,13 @@ import
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15058#discussion_r78356355
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
---
@@ -43,10 +43,13 @@ import
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
hi @srowen , thanks.
This is my first PR. learn much from you. thanks very much.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/15058
[MLLIB]Add setBins for BinaryClassificationMetrics
## What changes were proposed in this pull request?
Add a setBins method for BinaryClassificationMetrics.
BinaryClassificationMetrics
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15058#discussion_r78360606
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
---
@@ -43,10 +43,13 @@ import
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
hi @srowen , we have moved isSorted in ChiSqSelectorModel. There is error
message "*** method isSorted(Array[Int])Boolean in class
org.apache.spark.mllib.feature.ChiSqSelectorModel does not
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
Hi @srowen , Python style fail is updated. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
hi @srowen , I have updated the code for some MiMa test error. Could you
please review it again. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15214
Hi @yanboliang , got it. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15277#discussion_r80961052
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -57,22 +69,21 @@ class ChiSqSelectorModel @Since("
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15299
hi @srowen , is @transient needed for val selectedFeatures or val
filterIndices, one of them?
is it good to define filterIndices lazy?
---
If your project is set up for it, you can reply
Github user mpjlu closed the pull request at:
https://github.com/apache/spark/pull/15058
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15058
sure, I close it now
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
Thanks very much, I am in holiday now, will update the code this Sunday.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r79405860
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -54,11 +55,44 @@ private[feature] trait ChiSqSelectorParams extends
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/15212
[MLLIB][ML]add feature selector method based on: False Discovery Rate (FDR)
and Family wise error rate (FWE)
## What changes were proposed in this pull request?
Univariate feature selection
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15214#discussion_r80277277
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -143,13 +149,13 @@ final class ChiSqSelector @Since("1.6.0"
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15214#discussion_r80278819
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala ---
@@ -76,7 +76,7 @@ class ChiSqSelectorSuite extends SparkFunSuite
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15214#discussion_r80277785
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -160,6 +166,12 @@ final class ChiSqSelector @Since("1.6.0"
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15214
Hi @srowen and @yanboliang ; Thanks for your following up PR.
I partly agree with your comments on 17017.
**1. "if users both set numTopFeatures and percentile, it will train
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15214#discussion_r80277470
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -143,13 +149,13 @@ final class ChiSqSelector @Since("1.6.0"
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
No problem. thanks very much @srowen .
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
Hi @srowen, I have added the parameter to control the feature selection
type.
The usage is like this:
**var selector = new ChiSqSelector()
var model = selector.fit(df) // by default
GitHub user mpjlu reopened a pull request:
https://github.com/apache/spark/pull/14597
[SPARK-17017][MLLIB] add a chiSquare Selector based on False Positive Rate
(FPR) test
## What changes were proposed in this pull request?
Univariate feature selection works by selecting
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75286500
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -171,14 +177,47 @@ object ChiSqSelectorModel extends
Loader
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75294150
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +228,35 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75286355
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +228,35 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75286399
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +228,35 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75289389
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +228,35 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75286558
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -54,6 +54,29 @@ private[feature] trait ChiSqSelectorParams extends
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
Hi @srowen , I will update the Python API to match this changes. Now, the
current Python API is not conflict with the changes.
---
If your project is set up for it, you can reply to this email
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14824#discussion_r76421555
--- Diff:
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -454,10 +454,15 @@ class SparseMatrix @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14824#discussion_r76417535
--- Diff:
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -455,9 +455,11 @@ class SparseMatrix @Since("2.0.0") (
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/14824
[ML][MLLIB]The require condition and message doesn't match in SparseMatrix.
## What changes were proposed in this pull request?
The require condition and message doesn't match
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75802754
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -171,14 +180,47 @@ object ChiSqSelectorModel extends
Loader
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75802771
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -171,14 +180,47 @@ object ChiSqSelectorModel extends
Loader
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
Hi @srowen , I have added Python API and test cases for ChiSqSelector.
Could you kindly review it again. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r76624379
--- Diff: python/pyspark/mllib/feature.py ---
@@ -276,24 +276,64 @@ class ChiSqSelector(object):
"""
Creates a ChiSquared f
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75851527
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -171,14 +180,48 @@ object ChiSqSelectorModel extends
Loader
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75851763
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +232,21 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75852138
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -91,8 +137,17 @@ final class ChiSqSelector @Since("1.6.0"
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75851296
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -69,21 +73,26 @@ class ChiSqSelectorModel @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75856793
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -91,8 +137,17 @@ final class ChiSqSelector @Since("1.6.0"
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75858450
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -91,8 +137,17 @@ final class ChiSqSelector @Since("1.6.0"
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r76065118
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +232,21 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r76068163
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +232,21 @@ class ChiSqSelector @Since("
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
Sure, I can update the Python API.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r76059098
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +232,21 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r76059026
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +232,21 @@ class ChiSqSelector @Since("
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15212
hi @srowen @yanboliang , I have updated this PR. Thanks
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15277
sort in the transform will cause sort too many times.
so this looks good to me. thangks
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15214
hi @srowen .
My understand of yanbo's comments here is,
if user use chSqSelector like this:
model1 = new ChiSqSelector().setFPR(0.05).setKBest(100).fit(data)
model2 = new ChiSqSelector
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15214
Thanks, this looks good to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15214
Hi @srowen , sorry for forgetting update the doc and python/ml/feature.py
in last PR.
This pr has added ml/feature.py. It looks good to me.
Thanks
---
If your project is set up
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15647#discussion_r85311619
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -44,67 +44,78 @@ private[feature] trait ChiSqSelectorParams extends
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15647#discussion_r85311677
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -44,67 +44,78 @@ private[feature] trait ChiSqSelectorParams extends
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15647#discussion_r85310898
--- Diff: docs/mllib-feature-extraction.md ---
@@ -227,22 +227,19 @@ both speed and statistical learning behavior.
[`ChiSqSelector`](api/scala/index.html
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15647#discussion_r85310797
--- Diff: docs/ml-features.md ---
@@ -1333,14 +1333,14 @@ for more details on the API.
`ChiSqSelector` stands for Chi-Squared feature selection
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15647#discussion_r85310862
--- Diff: docs/mllib-feature-extraction.md ---
@@ -227,22 +227,19 @@ both speed and statistical learning behavior.
[`ChiSqSelector`](api/scala/index.html
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15647#discussion_r85311930
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -171,18 +171,19 @@ object ChiSqSelectorModel extends
Loader
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15647#discussion_r85310501
--- Diff: docs/ml-features.md ---
@@ -1333,14 +1333,14 @@ for more details on the API.
`ChiSqSelector` stands for Chi-Squared feature selection
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15212#discussion_r84049805
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -72,11 +72,15 @@ private[feature] trait ChiSqSelectorParams extends
1 - 100 of 270 matches
Mail list logo