zhengruifeng commented on issue #27374: [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors URL: https://github.com/apache/spark/pull/27374#issuecomment-579254416 env: bin/spark-shell --driver-memory=32G testCode: ```scala import org.apache.spark.ml.classification._ import org.apache.spark.storage.StorageLevel var df = spark.read.format("libsvm").load("/data1/Datasets/a9a/a9a").withColumn("label", (col("label")+1)/2) df.persist(StorageLevel.MEMORY_AND_DISK) df.count (0 until 8).foreach{ _ => df = df.union(df) } df.count new LogisticRegression().setMaxIter(10).fit(df) val lr1 = new LogisticRegression().setMaxIter(100).setFamily("binomial") val start = System.currentTimeMillis; val model1 = lr1.fit(df); val end = System.currentTimeMillis; end - start val lr2 = new LogisticRegression().setMaxIter(100).setFitIntercept(false).setFamily("binomial") val start = System.currentTimeMillis; val model2 = lr2.fit(df); val end = System.currentTimeMillis; end - start val lr3 = new LogisticRegression().setMaxIter(100).setFamily("multinomial") val start = System.currentTimeMillis; val model3 = lr3.fit(df); val end = System.currentTimeMillis; end - start val lr4 = new LogisticRegression().setMaxIter(100).setFitIntercept(false).setFamily("multinomial") val start = System.currentTimeMillis; val model4 = lr4.fit(df); val end = System.currentTimeMillis; end - start ``` result: this PR: RAM: 1418.9M DURATION: 136217, 161194, 171625, 177116 Master: RAM: 2.3G DURATION: 217035, 218267, 239111, 250163
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
