wankunde commented on PR #41119:
URL: https://github.com/apache/spark/pull/41119#issuecomment-1571755196

   Add a microbenchmark to evaluate the overhead of an additional function call.
   The expressions `"a * b"`, `"a / b"`, `"a * b / c"` will be wrapped in a 
function and will only be called once.
   If there is no complex expression, the query time changes from 1165ms to 
1270ms,
   If there is only one complex expression, the query time is the same as 
before. (6527ms and 7982ms)
   
   ```java
     override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
       spark.range(1, 20000000, 1, 1)
         .selectExpr(
           "cast(id + 1 as decimal) as a",
           "cast(id + 2 as decimal) as b",
           "cast(id + 3 as decimal) as c",
           "cast(id + 4 as decimal) as d")
         .createOrReplaceTempView("tab")
       runBenchmark("Subexpression elimination in FilterExec") {
         val benchmark =
           new Benchmark("Subexpression elimination in FilterExec", 20000000, 
output = output)
         for (expr <- Seq("a * b", "a / b", "a * b / c")) {
           benchmark.addCase(s"Test $expr expr") { _ =>
             val query =
               s"""
                  |SELECT a, b, c, d
                  |FROM tab
                  |WHERE $expr < 0 AND $expr < 1
                  |""".stripMargin
             spark.sql(query).noop()
           }
         }
   
         benchmark.run()
       }
     }
   ```
   
   ```
   Before this change: 
   Subexpression elimination in FilterExec:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Test a * b expr                                    1046           1165       
  169         19.1          52.3       1.0X
   Test a / b expr                                    6519           6527       
   12          3.1         325.9       0.2X
   Test a * b / c expr                                7634           7982       
  492          2.6         381.7       0.1X
   
   
   After this change: 
   Subexpression elimination in FilterExec:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Test a * b expr                                    1245           1270       
   35         16.1          62.3       1.0X
   Test a / b expr                                    6469           6582       
  160          3.1         323.4       0.2X
   Test a * b / c expr                                7751           7997       
  348          2.6         387.6       0.2X
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to