[spark] branch branch-3.3 updated: [SPARK-40669][SQL][TESTS] Parameterize `rowsNum` in `InMemoryColumnarBenchmark`

dongjoon Wed, 05 Oct 2022 18:13:37 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.3 by this push:
     new 5dc9ba0d227 [SPARK-40669][SQL][TESTS] Parameterize `rowsNum` in 
`InMemoryColumnarBenchmark`
5dc9ba0d227 is described below

commit 5dc9ba0d22741173bd122afb387c54d7ca4bfb6d
Author: Dongjoon Hyun <dongj...@apache.org>
AuthorDate: Wed Oct 5 18:01:55 2022 -0700

    [SPARK-40669][SQL][TESTS] Parameterize `rowsNum` in 
`InMemoryColumnarBenchmark`
    
    This PR aims to parameterize `InMemoryColumnarBenchmark` to accept 
`rowsNum`.
    
    This enables us to benchmark more flexibly.
    ```
    build/sbt "sql/test:runMain 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark 1000000"
    ...
    [info] Running benchmark: Int In-Memory scan
    [info]   Running case: columnar deserialization + columnar-to-row
    [info]   Stopped after 3 iterations, 444 ms
    [info]   Running case: row-based deserialization
    [info]   Stopped after 3 iterations, 462 ms
    [info] OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Mac OS X 12.6
    [info] Apple M1 Max
    [info] Int In-Memory scan:                         Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
    [info] 
--------------------------------------------------------------------------------------------------------------------------
    [info] columnar deserialization + columnar-to-row            119            
148          26          8.4         118.5       1.0X
    [info] row-based deserialization                             119            
154          32          8.4         119.5       1.0X
    ```
    
    ```
    $ build/sbt "sql/test:runMain 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark 10000000"
    ...
    [info] Running benchmark: Int In-Memory scan
    [info]   Running case: columnar deserialization + columnar-to-row
    [info]   Stopped after 3 iterations, 3855 ms
    [info]   Running case: row-based deserialization
    [info]   Stopped after 3 iterations, 4250 ms
    [info] OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Mac OS X 12.6
    [info] Apple M1 Max
    [info] Int In-Memory scan:                         Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
    [info] 
--------------------------------------------------------------------------------------------------------------------------
    [info] columnar deserialization + columnar-to-row           1082           
1285         199          9.2         108.2       1.0X
    [info] row-based deserialization                            1057           
1417         335          9.5         105.7       1.0X
    ```
    
    ```
    $ build/sbt "sql/test:runMain 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark 20000000"
    [info] Running benchmark: Int In-Memory scan
    [info]   Running case: columnar deserialization + columnar-to-row
    [info]   Stopped after 3 iterations, 8482 ms
    [info]   Running case: row-based deserialization
    [info]   Stopped after 3 iterations, 7534 ms
    [info] OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Mac OS X 12.6
    [info] Apple M1 Max
    [info] Int In-Memory scan:                         Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
    [info] 
--------------------------------------------------------------------------------------------------------------------------
    [info] columnar deserialization + columnar-to-row           2261           
2828         555          8.8         113.1       1.0X
    [info] row-based deserialization                            1788           
2511        1187         11.2          89.4       1.3X
    ```
    
    No. This is a benchmark test code.
    
    Manually.
    
    Closes #38114 from dongjoon-hyun/SPARK-40669.
    
    Authored-by: Dongjoon Hyun <dongj...@apache.org>
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
    (cherry picked from commit 95cfdc694d3e0b68979cd06b78b52e107aa58a9f)
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
---
 .../sql/execution/columnar/InMemoryColumnarBenchmark.scala | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala
index b975451e135..55d9fb27317 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarBenchmark.scala
@@ -26,14 +26,15 @@ import 
org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
  * {{{
  *   1. without sbt:
  *      bin/spark-submit --class <this class>
- *        --jars <spark core test jar> <spark sql test jar>
- *   2. build/sbt "sql/test:runMain <this class>"
- *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain <this class>"
+ *        --jars <spark core test jar>,<spark catalyst test jar> <spark sql 
test jar> <rowsNum>
+ *   2. build/sbt "sql/Test/runMain <this class> <rowsNum>"
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/Test/runMain <this class>
+ *        <rowsNum>"
  *      Results will be written to 
"benchmarks/InMemoryColumnarBenchmark-results.txt".
  * }}}
  */
 object InMemoryColumnarBenchmark extends SqlBasedBenchmark {
-  def intCache(rowsNum: Int, numIters: Int): Unit = {
+  def intCache(rowsNum: Long, numIters: Int): Unit = {
     val data = spark.range(0, rowsNum, 1, 1).toDF("i").cache()
 
     val inMemoryScan = data.queryExecution.executedPlan.collect {
@@ -59,8 +60,9 @@ object InMemoryColumnarBenchmark extends SqlBasedBenchmark {
   }
 
   override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
-    runBenchmark("Int In-memory") {
-      intCache(rowsNum = 1000000, numIters = 3)
+    val rowsNum = if (mainArgs.length > 0) mainArgs(0).toLong else 1000000
+    runBenchmark(s"Int In-memory with $rowsNum rows") {
+      intCache(rowsNum = rowsNum, numIters = 3)
     }
   }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-40669][SQL][TESTS] Parameterize `rowsNum` in `InMemoryColumnarBenchmark`

Reply via email to