zhengruifeng commented on pull request #28349:
URL: https://github.com/apache/spark/pull/28349#issuecomment-624405669
Merged to master
This is an automated message from the Apache Git Service.
To respond to the message, pleas
zhengruifeng commented on pull request #28349:
URL: https://github.com/apache/spark/pull/28349#issuecomment-623930812
> we need to tell the user about this tradeoff in the doc above.
@srowen I think there maybe other implementations (LoR/LiR/KMeans/GMM/...)
that will support blockify
zhengruifeng commented on pull request #28349:
URL: https://github.com/apache/spark/pull/28349#issuecomment-620976390
I will merge this PR this week if nobody object.
Different from the [previous
one](https://github.com/apache/spark/pull/27360), this PR will no cause
performace regressi
zhengruifeng commented on pull request #28349:
URL: https://github.com/apache/spark/pull/28349#issuecomment-619692445
I also test on sparse dataset:
```
import org.apache.spark.ml.classification._
import org.apache.spark.storage.StorageLevel
val df = spark.read.option("numFea
zhengruifeng commented on pull request #28349:
URL: https://github.com/apache/spark/pull/28349#issuecomment-619512391
The speedup is more significiant than that in
https://github.com/apache/spark/pull/27360,
I think that is because: dataset `epsilon` has 2,000 features while a9a only
ha
zhengruifeng commented on pull request #28349:
URL: https://github.com/apache/spark/pull/28349#issuecomment-619511225
The main part of this PR is similar to
https://github.com/apache/spark/pull/27360,
while this PR will choose the original impl if `blockSize=1`
zhengruifeng commented on pull request #28349:
URL: https://github.com/apache/spark/pull/28349#issuecomment-619510418
friendly ping @srowen @WeichenXu123
Using high-level BLAS on dense datasets makes SVC much faster than existing
impl, even without NativeBLAS.
To avoid performanc
zhengruifeng commented on pull request #28349:
URL: https://github.com/apache/spark/pull/28349#issuecomment-619509562
dataset:
[epsilon_normalized.t](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html),
numInstances=100,000, numFeatures=2,000
testCode:
```