This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.3 by this push: new 5dc9ba0d227 [SPARK-40669][SQL][TESTS] Parameterize `rowsNum` in `InMemoryColumnarBenchmark` 5dc9ba0d227 is described below commit 5dc9ba0d22741173bd122afb387c54d7ca4bfb6d Author: Dongjoon Hyun <dongj...@apache.org> AuthorDate: Wed Oct 5 18:01:55 2022 -0700 [SPARK-40669][SQL][TESTS] Parameterize `rowsNum` in `InMemoryColumnarBenchmark` This PR aims to parameterize `InMemoryColumnarBenchmark` to accept `rowsNum`. This enables us to benchmark more flexibly. ``` build/sbt "sql/test:runMain org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark 1000000" ... [info] Running benchmark: Int In-Memory scan [info] Running case: columnar deserialization + columnar-to-row [info] Stopped after 3 iterations, 444 ms [info] Running case: row-based deserialization [info] Stopped after 3 iterations, 462 ms [info] OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Mac OS X 12.6 [info] Apple M1 Max [info] Int In-Memory scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] -------------------------------------------------------------------------------------------------------------------------- [info] columnar deserialization + columnar-to-row 119 148 26 8.4 118.5 1.0X [info] row-based deserialization 119 154 32 8.4 119.5 1.0X ``` ``` $ build/sbt "sql/test:runMain org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark 10000000" ... [info] Running benchmark: Int In-Memory scan [info] Running case: columnar deserialization + columnar-to-row [info] Stopped after 3 iterations, 3855 ms [info] Running case: row-based deserialization [info] Stopped after 3 iterations, 4250 ms [info] OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Mac OS X 12.6 [info] Apple M1 Max [info] Int In-Memory scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] -------------------------------------------------------------------------------------------------------------------------- [info] columnar deserialization + columnar-to-row 1082 1285 199 9.2 108.2 1.0X [info] row-based deserialization 1057 1417 335 9.5 105.7 1.0X ``` ``` $ build/sbt "sql/test:runMain org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark 20000000" [info] Running benchmark: Int In-Memory scan [info] Running case: columnar deserialization + columnar-to-row [info] Stopped after 3 iterations, 8482 ms [info] Running case: row-based deserialization [info] Stopped after 3 iterations, 7534 ms [info] OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Mac OS X 12.6 [info] Apple M1 Max [info] Int In-Memory scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] -------------------------------------------------------------------------------------------------------------------------- [info] columnar deserialization + columnar-to-row 2261 2828 555 8.8 113.1 1.0X [info] row-based deserialization 1788 2511 1187 11.2 89.4 1.3X ``` No. This is a benchmark test code. Manually. Closes #38114 from dongjoon-hyun/SPARK-40669. Authored-by: Dongjoon Hyun <dongj...@apache.org> Signed-off-by: Dongjoon Hyun <dongj...@apache.org> (cherry picked from commit 95cfdc694d3e0b68979cd06b78b52e107aa58a9f) Signed-off-by: Dongjoon Hyun <dongj...@apache.org> --- .../sql/execution/columnar/InMemoryColumnarBenchmark.scala | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala index b975451e135..55d9fb27317 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala @@ -26,14 +26,15 @@ import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark * {{{ * 1. without sbt: * bin/spark-submit --class <this class> - * --jars <spark core test jar> <spark sql test jar> - * 2. build/sbt "sql/test:runMain <this class>" - * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>" + * --jars <spark core test jar>,<spark catalyst test jar> <spark sql test jar> <rowsNum> + * 2. build/sbt "sql/Test/runMain <this class> <rowsNum>" + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/Test/runMain <this class> + * <rowsNum>" * Results will be written to "benchmarks/InMemoryColumnarBenchmark-results.txt". * }}} */ object InMemoryColumnarBenchmark extends SqlBasedBenchmark { - def intCache(rowsNum: Int, numIters: Int): Unit = { + def intCache(rowsNum: Long, numIters: Int): Unit = { val data = spark.range(0, rowsNum, 1, 1).toDF("i").cache() val inMemoryScan = data.queryExecution.executedPlan.collect { @@ -59,8 +60,9 @@ object InMemoryColumnarBenchmark extends SqlBasedBenchmark { } override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { - runBenchmark("Int In-memory") { - intCache(rowsNum = 1000000, numIters = 3) + val rowsNum = if (mainArgs.length > 0) mainArgs(0).toLong else 1000000 + runBenchmark(s"Int In-memory with $rowsNum rows") { + intCache(rowsNum = rowsNum, numIters = 3) } } } --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org