zhengruifeng commented on pull request #28349:
URL: https://github.com/apache/spark/pull/28349#issuecomment-619692445
I also test on sparse dataset:
```
import org.apache.spark.ml.classification._
import org.apache.spark.storage.StorageLevel
val df = spark.read.option("numFeatures",
"8289919").format("libsvm").load("/data1/Datasets/webspam/webspam_wc_normalized_trigram.svm.10k").withColumn("label",
(col("label")+1)/2)
val svc = new LinearSVC().setMaxIter(10)
svc.fit(df)
val start = System.currentTimeMillis; val model1 =
svc.setMaxIter(30).fit(df); val end = System.currentTimeMillis; end - start
```
results:
this PR:
```
scala> val start = System.currentTimeMillis; val model1 =
svc.setMaxIter(30).fit(df); val end = System.currentTimeMillis; end - start
start: Long = 1587957534286
model1: org.apache.spark.ml.classification.LinearSVCModel = LinearSVCModel:
uid=linearsvc_2fcd0abbb2d7, numClasses=2, numFeatures=8289919
end: Long = 1587957684508
res1: Long = 150222
```
Master:
```
scala> val start = System.currentTimeMillis; val model1 =
svc.setMaxIter(30).fit(df); val end = System.currentTimeMillis; end - start
start: Long = 1587957959670
model1: org.apache.spark.ml.classification.LinearSVCModel = LinearSVCModel:
uid=linearsvc_269e4f373d2c, numClasses=2, numFeatures=8289919
end: Long = 1587958111562
res1: Long = 151892
```
If we keep `blockSIze=1`, then there is no performance regression on sparse
dataset.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]