Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15212
Thanks @jkbradley , I will send a follow-up PR for your comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/16434
[SPARK-17645][MLLIB][ML][FOLLOW-UP] document minor change
## What changes were proposed in this pull request?
This is a follow-up pr for #15212 to address @jkbradley comments on
Document change
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15212
hi @jkbradley @yanboliang , I have created a follow up PR for this PR.
https://github.com/apache/spark/pull/16434
I have not added FDR test in ML Suite. The main reason is the current data
set is
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/16452
[ML] fix getThresholds logic error
## What changes were proposed in this pull request?
The logic of getThresholds in ML LogisticRegression is not right, and it
doesn't match wit
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/16452
If both threshold and thresholds are not set, the master will return
thresholds.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/16452
@sethah , thanks, I got it wrong. I will close it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user mpjlu closed the pull request at:
https://github.com/apache/spark/pull/16452
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/16434
Hi @jkbradley , I have updated this PR per your comments. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/16434
Thanks @jkbradley @srowen , I have added a code snippet for verifying with
R.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/15444
[SPARK-17870][MLLIB][ML]Change statistic to pValue for SelectKBest and
SelectPercentile because of DoF difference
## What changes were proposed in this pull request?
For feature selection
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15212
Hi @yanboliang @srowen , this is the last two feature selection methods
based on ChiSquare, which is similar to the method in scikit learn. But there
is a bug about SelectFDR in scikit learn. I have
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15212#discussion_r84049606
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -72,11 +72,15 @@ private[feature] trait ChiSqSelectorParams extends
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15212#discussion_r84049805
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -72,11 +72,15 @@ private[feature] trait ChiSqSelectorParams extends
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15212#discussion_r84232802
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -243,6 +245,19 @@ class ChiSqSelector @Since("2.1.0")
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15212
hi @yanboliang , @srowen @jkbradley , I have updated this PR, thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15212
Hi @yanboliang and @srowen , could you please review whether this PR
includes all your comments. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15647#discussion_r85310501
--- Diff: docs/ml-features.md ---
@@ -1333,14 +1333,14 @@ for more details on the API.
`ChiSqSelector` stands for Chi-Squared feature selection. It
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15647#discussion_r85310797
--- Diff: docs/ml-features.md ---
@@ -1333,14 +1333,14 @@ for more details on the API.
`ChiSqSelector` stands for Chi-Squared feature selection. It
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15647#discussion_r85310862
--- Diff: docs/mllib-feature-extraction.md ---
@@ -227,22 +227,19 @@ both speed and statistical learning behavior.
[`ChiSqSelector`](api/scala/index.html
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15647#discussion_r85310898
--- Diff: docs/mllib-feature-extraction.md ---
@@ -227,22 +227,19 @@ both speed and statistical learning behavior.
[`ChiSqSelector`](api/scala/index.html
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15647#discussion_r85311619
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -44,67 +44,78 @@ private[feature] trait ChiSqSelectorParams extends
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15647#discussion_r85311677
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -44,67 +44,78 @@ private[feature] trait ChiSqSelectorParams extends
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15647#discussion_r85311930
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -171,18 +171,19 @@ object ChiSqSelectorModel extends
Loader
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18904
![image](https://user-images.githubusercontent.com/13826327/34948104-2fa1982a-fa47-11e7-9312-f1935cca758b.png)
This is one of my test results.
Now, I am not working on Spark MLLIB, and don
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18904
This is another case.
Table 1 shows the improvement of random tree algorithm with sparse
expression. We can see that when we use sparse expression, I/O can be reduced
by 61% and total run time
Github user mpjlu closed the pull request at:
https://github.com/apache/spark/pull/18904
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18904
Because I don't have the environment to continue this work, I will close
it.
---
-
To unsubscribe, e-mail: reviews-uns
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/19516
Because I don't have the environment to continue this work, I will close
it. Thanks.
---
-
To unsubscribe, e-mail: re
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/19536
Because I don't have the environment to continue this work, I will close
it. Thanks.
---
-
To unsubscribe, e-mail: re
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/19337
Because I don't have the environment to continue this work, I will close
it. Thanks.
---
-
To unsubscribe, e-mail: re
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18624
Because I don't have the environment to continue this work, I will close
it. Thanks.
---
-
To unsubscribe, e-mail: re
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17739
Because I don't have the environment to continue this work, I will close
it. Thanks.
---
-
To unsubscribe, e-mail: re
Github user mpjlu closed the pull request at:
https://github.com/apache/spark/pull/17739
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user mpjlu closed the pull request at:
https://github.com/apache/spark/pull/19536
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user mpjlu closed the pull request at:
https://github.com/apache/spark/pull/19516
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user mpjlu closed the pull request at:
https://github.com/apache/spark/pull/19337
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user mpjlu closed the pull request at:
https://github.com/apache/spark/pull/18624
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18904
Thanks @MLnick, I will be glad if you can continue it.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
I find why F2j BLAS is much faster than Native BLAS for xiangrui's method
(use GEMM) here.
https://issues.apache.org/jira/browse/SPARK-21305
---
If your project is set up for it, you can
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/18551
[SPARK-21305][ML][MLLIB]Add options to disable multi-threading of native
BLAS
## What changes were proposed in this pull request?
Many ML/MLLIB algorithms use native BLAS (like Intel MKL
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18551
Thanks, @srowen . I have updated the doc.
I also validated the current option in spark-env.sh, it works.
Thanks.
---
If your project is set up for it, you can reply to this email and have
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/18551#discussion_r126323336
--- Diff: docs/ml-guide.md ---
@@ -61,6 +61,11 @@ To configure `netlib-java` / Breeze to use system
optimised binaries, include
project and read the
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18551
hi @felixcheung , I have tested one case, write a single thread java
program, and call native blas. The performance is much better to disable native
blas multi-threading (the total program
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18551
hi @srowen , I understand Felix's point. I mean if you only have 1 task in
C/C++, and 2 CPUs, setting native BLAS to use 2 CPUs will be faster. But in JVM
env, even you only have one task, and 2
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18551
Hi @srowen , Thanks very much for your review.
I will revise the document of this PR to soften the language.
According to my profiling data, I guess, when the native BLAS is loaded (or
when a
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18551
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/18551#discussion_r126665323
--- Diff: docs/ml-guide.md ---
@@ -61,6 +61,12 @@ To configure `netlib-java` / Breeze to use system
optimised binaries, include
project and read the
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18551
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
I have rewritten recommendForAll with BLAS GEMM, and get about 20%-30%
performance improvement.
https://issues.apache.org/jira/browse/SPARK-21389
---
If your project is set up for it, you can
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/18620
[MINOR][ML][MLLIB] add poll function for BoundedPriorityQueue
## What changes were proposed in this pull request?
The most of BoundedPriorityQueue usages in ML/MLLIB are:
Get the value of
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
Yes, my following PR will use it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
Ok, thanks @srowen .
I will create a JIRA, and show the usage and performance comparing.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/18624
[SPARK-21389][ML][MLLIB] Optimize ALS recommendForAll by gemm with about
50% performance improvement
## What changes were proposed in this pull request?
In Spark 2.2, we have optimized ALS
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/18624#discussion_r127214361
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -286,40 +288,124 @@ object
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
Hi @srowen , I have added Test Suite for BoundedPriorityQueue. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
I have submitted PR for ALS optimization with GEMM. and it is ready for
review.
The performance is about 50% improvement comparing with the master method.
https://github.com/apache/spark/pull
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18624
An user block, after Cartesian, will generate many blocks(Number of Item
blocks), all these blocks should be aggregated. Thanks.
---
If your project is set up for it, you can reply to this email
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18624
We need the value is in order here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18624
If no poll, we have to use toArray.sorted, which performance is bad.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18624
I have checked the results with the master method, the recommendation
results are right.
The master TestSuite is too simple, should be updated. I will update it.
Thanks.
---
If your
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/18624#discussion_r127641933
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -286,40 +288,120 @@ object
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/18624#discussion_r127669102
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -286,40 +288,120 @@ object
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
Keep it or close it, both is ok for me. We have much discussion on:
https://issues.apache.org/jira/browse/SPARK-21401
---
If your project is set up for it, you can reply to this email and have
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
I have tested much about poll and toArray.sorted.
If the queue is much ordered (suppose offer 2000 times for queue size 20).
Use pq.toArray.sorted is faster.
If the queue is much disordered
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
Hi @MLnick ,
pq.toArray.sorted also used in other places, like word2vector and LDA, how
about waiting for my other benchmark results. Then decide to close it or not.
Thanks.
---
If your
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
Hi @MLnick , @srowen .
My test showing: pq.poll is not significantly faster than
pq.toArray.sortBy, but significantly faster than pq.toArray.sorted. Seems not
each pq.toArray.sorted (such as
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
I also very confused about this. You can change
https://github.com/apache/spark/pull/18624 to sorted and test.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
My micro benchmark (write a program only test pq.toArray.sorted and
pq.Array.sortBy and pq.poll), not find significant performance difference. Only
in the Spark job, there is big difference. Confused
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
I am ok to close this. Thanks @MLnick
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
Thanks @srowen , my test also said pq.poll is a little faster on some
cases.
One possible benefit here is if we provide pq.poll, user's first choice may
use pq.poll, not pq.toArray.sorted,
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18624
Hi @srowen @MLnick @jkbradley @mengxr @yanboliang
Is this change acceptable? if it is acceptable, I will update ALS ML code
following this method. Also update Test Suite, which are too simple
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75851296
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -69,21 +73,26 @@ class ChiSqSelectorModel @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75851527
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -171,14 +180,48 @@ object ChiSqSelectorModel extends
Loader
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75851763
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +232,21 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75852138
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -91,8 +137,17 @@ final class ChiSqSelector @Since("1.6.0"
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75856793
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -91,8 +137,17 @@ final class ChiSqSelector @Since("1.6.0"
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r75858450
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -91,8 +137,17 @@ final class ChiSqSelector @Since("1.6.0"
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/14785
[SPARK-17207][MLLIB]fix comparing Vector bug in TestingUtils
## What changes were proposed in this pull request?
fix comparing Vector bug in TestingUtils.
There is the same bug for
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14785#discussion_r76035442
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtils.scala ---
@@ -154,7 +154,11 @@ object TestingUtils {
*/
def absTol
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14785#discussion_r76037277
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtils.scala ---
@@ -154,7 +154,11 @@ object TestingUtils {
*/
def absTol
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r76041373
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +232,21 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r76059026
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +232,21 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r76059098
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +232,21 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r76065118
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +232,21 @@ class ChiSqSelector @Since("
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r76068163
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -189,11 +232,21 @@ class ChiSqSelector @Since("
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
Sure, I can update the Python API.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14785
Sure, I will fix it, and add test cases. thanks. @dbtsai ,
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14785
Hi @dbtsai , PR 2294 added Matrix comparing in TestingUtils, but did not
add any test cases in TestingUtilsSuite. I did not add test cases for Matrix
comparing in the PR either.
If Matrix
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/14824
[ML][MLLIB]The require condition and message doesn't match in SparseMatrix.
## What changes were proposed in this pull request?
The require condition and message doesn't matc
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14824#discussion_r76417535
--- Diff:
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -455,9 +455,11 @@ class SparseMatrix @Since("2.0.0") (
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14824#discussion_r76421555
--- Diff:
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala ---
@@ -454,10 +454,15 @@ class SparseMatrix @Since("
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
Hi @srowen , I have added Python API and test cases for ChiSqSelector.
Could you kindly review it again. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r76624379
--- Diff: python/pyspark/mllib/feature.py ---
@@ -276,24 +276,64 @@ class ChiSqSelector(object):
"""
Creates a ChiSquared f
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r77137991
--- Diff: python/pyspark/mllib/feature.py ---
@@ -276,24 +276,64 @@ class ChiSqSelector(object):
"""
Creates a ChiSquared f
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r77473261
--- Diff: python/pyspark/mllib/feature.py ---
@@ -276,24 +276,64 @@ class ChiSqSelector(object):
"""
Creates a ChiSquared f
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r77532997
--- Diff: python/pyspark/mllib/feature.py ---
@@ -271,29 +271,74 @@ def transform(self, vector):
"""
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/14597#discussion_r77536408
--- Diff: python/pyspark/mllib/feature.py ---
@@ -305,7 +350,12 @@ def fit(self, data):
treated as categorical for each distinct
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
Hi @yanboliang , could you please kindly review the python code of this PR.
Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/15058
[MLLIB]Add setBins for BinaryClassificationMetrics
## What changes were proposed in this pull request?
Add a setBins method for BinaryClassificationMetrics.
BinaryClassificationMetrics
1 - 100 of 270 matches
Mail list logo