zhengruifeng edited a comment on issue #27360: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors URL: https://github.com/apache/spark/pull/27360#issuecomment-578407327 testCode: ```scala import org.apache.spark.ml.classification._ import org.apache.spark.storage.StorageLevel var df = spark.read.format("libsvm").load("/data1/Datasets/a9a/a9a").withColumn("label", (col("label")+1)/2) df.persist(StorageLevel.MEMORY_AND_DISK) df.count (0 until 8).foreach{ _ => df = df.union(df) } df.count new LinearSVC().setMaxIter(10).fit(df) // warm up val svc = new LinearSVC().setMaxIter(100) val start = System.currentTimeMillis; val model = svc.fit(df); val end = System.currentTimeMillis; end - start val svc = new LinearSVC().setMaxIter(100).setFitIntercept(false) val start = System.currentTimeMillis; val model = svc.fit(df); val end = System.currentTimeMillis; end - start ``` result: this PR: RAM:1418.9M Duration: 396524(fitIntercept=true), 324944(fitIntercept=false) MASTER: RAM: 2.3G Duration: 446354(fitIntercept=true), 441961(fitIntercept=false) Native-BLAS is NOT used in above tests, maybe future performance gain can be obtained by setting appropriate Native BLAS.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
