Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15212#discussion_r84049606
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -72,11 +72,15 @@ private[feature] trait ChiSqSelectorParams extends
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15212#discussion_r84232802
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -243,6 +245,19 @@ class ChiSqSelector @Since("2.1.0")
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15212
Hi @yanboliang and @srowen , could you please review whether this PR
includes all your comments. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15212
hi @yanboliang , @srowen @jkbradley , I have updated this PR, thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/15444
[SPARK-17870][MLLIB][ML]Change statistic to pValue for SelectKBest and
SelectPercentile because of DoF difference
## What changes were proposed in this pull request?
For feature selection
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15212
Hi @yanboliang @srowen , this is the last two feature selection methods
based on ChiSquare, which is similar to the method in scikit learn. But there
is a bug about SelectFDR in scikit learn. I have
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/16434
Hi @jkbradley , I have updated this PR per your comments. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/16434
[SPARK-17645][MLLIB][ML][FOLLOW-UP] document minor change
## What changes were proposed in this pull request?
This is a follow-up pr for #15212 to address @jkbradley comments on
Document change
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15212
Thanks @jkbradley , I will send a follow-up PR for your comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/15212
hi @jkbradley @yanboliang , I have created a follow up PR for this PR.
https://github.com/apache/spark/pull/16434
I have not added FDR test in ML Suite. The main reason is the current data
set
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/16452
[ML] fix getThresholds logic error
## What changes were proposed in this pull request?
The logic of getThresholds in ML LogisticRegression is not right, and it
doesn't match
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/16452
If both threshold and thresholds are not set, the master will return
thresholds.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mpjlu closed the pull request at:
https://github.com/apache/spark/pull/16452
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/16452
@sethah , thanks, I got it wrong. I will close it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/16434
Thanks @jkbradley @srowen , I have added a code snippet for verifying with
R.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/17739
[SPARK-20443][MLLIB][ML] set ALS blockify size
## What changes were proposed in this pull request?
The blockSize of MLLIB ALS is very important for ALS performance.
In our test
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18748
Thanks.
This is my test setting:
3 workersï¼ each: 40 cores, 196G memory, 1 executor.
Data Size: user 480,000, item 17,000
---
If your project is set up for it, you can reply
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/18832
[SPARK-21623][ML]fix RF doc
## What changes were proposed in this pull request?
comments of parentStats in RF are wrong.
parentStats is not only used for the first iteration, it is used
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18832
node.stats is ImpurityStats, and parentStats is Array[Double], there are
different. Maybe this comment should be used on node.stats, but not on
parentStats. Is my understanding wrong?
---
If your
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18832
I know your point.
I am confusing the code doesn't work that way.
The code update parentStats for each iteration. Actually, we only need to
update parentStats for the first Iteration
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18832
parentStats is used in this
code:ãbinAggregates.getParentImpurityCalculator(), this is used in all
iteration.
So that comment seems very misleading.
`} else
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18832
I agree with you. Do you think we should update the comment to help others
understand the code.
Since parantStats is updated and used in each iteration.
Thanks.
---
If your project is set
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18832
Thanks @sethah .
I strongly think we should update the commend or just delete the comment as
the current PR.
Another reason is: there are three kinds of feature: categorical, ordered
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18899
I have tested the performance of toSparse and toSparseWithSize separately.
There is about 35% performance improvement for this change.
---
If your project is set up for it, you can reply
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18899
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18899
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18899
For PR-18904, before this change, one iteration is about 58s, after this
change, one iteration is about:40s
---
If your project is set up for it, you can reply to this email and have your
reply
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18904
A gentle ping: @sethah @jkbradley
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18899
Hi @srowen; how about using our first version? though duplicate some code,
but change is small.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18899
Yes, I just concern if add toSparse(size) we should check the size in the
code, there will be no performance gain. If we don't need to check the "size"
(comparing size with numNonZero) i
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/18904
[SPARK-21624]optimzie RF communicaiton cost
## What changes were proposed in this pull request?
The implementation of RF is bound by either the cost of statistics
computation on workers
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18899
Thanks @srowen.
I will revise the code per your suggestion.
when I wrote the code, I just concerned user call toSparse(size) and give a
very small size.
---
If your project is set up
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18899
I did not only test this PR. Only work for PR 18904 and find this
performance difference.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18899
Thanks @sethah @srowen . The comment is added.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18904
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/18899#discussion_r132610165
--- Diff: project/MimaExcludes.scala ---
@@ -1012,6 +1012,10 @@ object MimaExcludes {
ProblemFilters.exclude[IncompatibleResultTypeProblem
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18899
Hi @sethah , the unit test is added. Thanks
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/18899#discussion_r132610049
--- Diff:
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -635,8 +642,9 @@ class SparseVector @Since("2.0.0") (
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18868
Yes, that is right. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/18624#discussion_r127214361
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -286,40 +288,124 @@ object
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
I have submitted PR for ALS optimization with GEMM. and it is ready for
review.
The performance is about 50% improvement comparing with the master method.
https://github.com/apache/spark/pull
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
Hi @srowen , I have added Test Suite for BoundedPriorityQueue. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18624
An user block, after Cartesian, will generate many blocks(Number of Item
blocks), all these blocks should be aggregated. Thanks.
---
If your project is set up for it, you can reply to this email
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18624
If no poll, we have to use toArray.sorted, which performance is bad.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
I have tested much about poll and toArray.sorted.
If the queue is much ordered (suppose offer 2000 times for queue size 20).
Use pq.toArray.sorted is faster.
If the queue is much disordered
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
Keep it or close it, both is ok for me. We have much discussion on:
https://issues.apache.org/jira/browse/SPARK-21401
---
If your project is set up for it, you can reply to this email and have
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
Hi @MLnick ,
pq.toArray.sorted also used in other places, like word2vector and LDA, how
about waiting for my other benchmark results. Then decide to close it or not.
Thanks.
---
If your
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/18624#discussion_r127669102
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -286,40 +288,120 @@ object
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/18551#discussion_r126665323
--- Diff: docs/ml-guide.md ---
@@ -61,6 +61,12 @@ To configure `netlib-java` / Breeze to use system
optimised binaries, include
project and read
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18551
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/18551#discussion_r126323336
--- Diff: docs/ml-guide.md ---
@@ -61,6 +61,11 @@ To configure `netlib-java` / Breeze to use system
optimised binaries, include
project and read
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18551
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
I have rewritten recommendForAll with BLAS GEMM, and get about 20%-30%
performance improvement.
https://issues.apache.org/jira/browse/SPARK-21389
---
If your project is set up for it, you can
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/18620
[MINOR][ML][MLLIB] add poll function for BoundedPriorityQueue
## What changes were proposed in this pull request?
The most of BoundedPriorityQueue usages in ML/MLLIB are:
Get the value
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
Yes, my following PR will use it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
Ok, thanks @srowen .
I will create a JIRA, and show the usage and performance comparing.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18624
We need the value is in order here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/18624
[SPARK-21389][ML][MLLIB] Optimize ALS recommendForAll by gemm with about
50% performance improvement
## What changes were proposed in this pull request?
In Spark 2.2, we have optimized ALS
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
I am ok to close this. Thanks @MLnick
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
Thanks @srowen , my test also said pq.poll is a little faster on some
cases.
One possible benefit here is if we provide pq.poll, user's first choice may
use pq.poll, not pq.toArray.sorted, which
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
I also very confused about this. You can change
https://github.com/apache/spark/pull/18624 to sorted and test.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
My micro benchmark (write a program only test pq.toArray.sorted and
pq.Array.sortBy and pq.poll), not find significant performance difference. Only
in the Spark job, there is big difference. Confused
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18624
Hi @srowen @MLnick @jkbradley @mengxr @yanboliang
Is this change acceptable? if it is acceptable, I will update ALS ML code
following this method. Also update Test Suite, which are too simple
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18620
Hi @MLnick , @srowen .
My test showing: pq.poll is not significantly faster than
pq.toArray.sortBy, but significantly faster than pq.toArray.sorted. Seems not
each pq.toArray.sorted
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18551
hi @felixcheung , I have tested one case, write a single thread java
program, and call native blas. The performance is much better to disable native
blas multi-threading (the total program
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18551
Hi @srowen , Thanks very much for your review.
I will revise the document of this PR to soften the language.
According to my profiling data, I guess, when the native BLAS is loaded (or
when
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18551
hi @srowen , I understand Felix's point. I mean if you only have 1 task in
C/C++, and 2 CPUs, setting native BLAS to use 2 CPUs will be faster. But in JVM
env, even you only have one task, and 2 CPUs
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18624
I have checked the results with the master method, the recommendation
results are right.
The master TestSuite is too simple, should be updated. I will update it.
Thanks.
---
If your
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/18624#discussion_r127641933
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -286,40 +288,120 @@ object
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
I find why F2j BLAS is much faster than Native BLAS for xiangrui's method
(use GEMM) here.
https://issues.apache.org/jira/browse/SPARK-21305
---
If your project is set up for it, you can reply
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/18551
[SPARK-21305][ML][MLLIB]Add options to disable multi-threading of native
BLAS
## What changes were proposed in this pull request?
Many ML/MLLIB algorithms use native BLAS (like Intel MKL
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18551
Thanks, @srowen . I have updated the doc.
I also validated the current option in spark-env.sh, it works.
Thanks.
---
If your project is set up for it, you can reply to this email and have
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
Another case
3 workersï¼ each 40 cores, each 196G memory, each 1 executor.
Data Size: user 480,000, item 17,000
recommendProductsForUsers with blockSize 4096 is about 34s
---
If your
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
Hi @MLnick , The new test results are:
3 worker, each 10 cores, each 30G memory, each 1 executor.
Data Size: user 3,290,000, item 200,000.
recommendProductsForUsers with blockSize 4096
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17742#discussion_r113444550
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -277,17 +278,39 @@ object
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
Thanks very much @MLnick .
I am doing more test about mllib solution. When it is solid enough, then we
can submit a follow up PR for ML optimization. How do you think about it?
---
If your
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17742#discussion_r113445477
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -277,17 +278,39 @@ object
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17739
users is 480,000, items is 170,000. Thanks
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17739
Thanks @MLnick . Could you please review my another PR for recommend all
performance problem.
https://github.com/apache/spark/pull/17742.
Sorry, I forget user cannot call recommendForAll
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/17742
[Spark-20446][ML][MLLIB]Optimize MLLIB ALS recommendForAll
## What changes were proposed in this pull request?
The recommendForAll of MLLIB ALS is very slow.
GC is a key problem
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17739
RecommandProductsForUsers. Thanks
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17742#discussion_r113862880
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -276,44 +277,53 @@ object
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
Hi @MLnick, I will be on vacation next week.
If you have time to create an ML optimization follow up PR. I am ok.
Otherwise, I will submit the follow up PR after my 1 week vacation.
Thanks
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
retest please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
Thanks @MLnick. Please go ahead for ML API optimization.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18748
Did you test the performance of this, I tested the performance of MLLIB
recommendForUserSubset some days ago, the performance is not good. Suppose the
time of recommendForAll is 35s, recommend for 1
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/18899
[SPARK-21680][ML][MLLIB]optimzie Vector coompress
## What changes were proposed in this pull request?
When use Vector.compressed to change a Vector to SparseVector, the
performance is very
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/18868#discussion_r131605981
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1107,9 +1108,11 @@ private[spark] object RandomForest extends Logging
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18868
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user mpjlu opened a pull request:
https://github.com/apache/spark/pull/18868
[SPARK-21638][ML]Fix RF/GBT Warning message error
## What changes were proposed in this pull request?
When train RF model, there are many warning messages like this:
> W
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18832
Thanks @srowen , I revised the comments per Seth's suggestion: "Parent
stats need to be explicitly tracked in the DTStatsAggregator because the parent
[[Node]] object does not have Impurity
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/18748
Thanks @MLnick . I have double checked my test.
Since there is no recommendForUserSubset , my previous test is MLLIB
MatrixFactorizationModel::predict(RDD(Int, Int)), which predicts the rating
Github user mpjlu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17742#discussion_r114579054
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
---
@@ -276,44 +277,53 @@ object
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
** The most optimized version would be doing a quickselect on each row and
select top k.
** An easy-to-implement version would be:
I test both of the methods, the best performance is about 50
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
I not validate whether this code is right. just test performance.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
val srcBlocks = blockify(rank, srcFeatures)
val dstBlocks = blockify(rank, dstFeatures)
val pq = new BoundedPriorityQueue[(Int, Double)](num)(Ordering.by(_._2))
val ratings
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17742
F2Jblas is faster than MKL blas. The following test is based on F2jBLAS.
Method 1: BLAS 3 + quickselect on each row and select top k.
Method 2: this PR
BLOCK size: 256 512 1024 2048
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17919
Thanks, I am ok for this change.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
101 - 200 of 270 matches
Mail list logo