wankunde commented on PR #41685:
URL: https://github.com/apache/spark/pull/41685#issuecomment-1600993090

   > This test looks good. I've verified that it tests the code path.
   > 
   > Another thing is, the claim of the proposed change is to improve distinct 
queries performance. But I don't see any reported number of performance. If you 
have run benchmark or you have production workloads getting improvement from 
it, could you post the numbers?
   
   A local benchmark
   ```java
   object AggregateBenchmark extends SqlBasedBenchmark {
   
     override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
       runBenchmark("aggregate benchmark with fast hashMap") {
         Seq(16, 18, 20).foreach { CAPACITY_BIT =>
           val N = 1 << CAPACITY_BIT
           val benchmark = new Benchmark(s"HashMap size : $N", N, output = 
output)
           val inputDF = spark
             .range(N)
             .selectExpr(
               "id",
               "(id & 1023) as k1",
               "cast(id & 1023 as string) as k2",
               "cast(id & 1023 as int) as k3",
               "cast(id & 1023 as double) as k4",
               "cast(id & 1023 as float) as k5",
               "id > 1023 as k6")
           inputDF.cache()
           Seq(false, true).map { enable =>
             benchmark.addCase(s"Aggregate with two level aggregate $enable", 
numIters = 2) { _ =>
               withSQLConf(
                 SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> enable.toString,
                 SQLConf.FAST_HASH_AGGREGATE_MAX_ROWS_CAPACITY_BIT.key -> 
CAPACITY_BIT.toString) {
                 inputDF.distinct().noop()
               }
             }
           }
   
           benchmark.run()
         }
       }
     }
   }
   ```
   
   Benchmark result: 
   ```
   Running benchmark: HashMap size : 65536
     Running case: Aggregate with two level aggregate false
     Stopped after 2 iterations, 240 ms
     Running case: Aggregate with two level aggregate true
     Stopped after 2 iterations, 119 ms
   
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.16
   Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
   HashMap size : 65536:                     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Aggregate with two level aggregate false            117            120       
    4          0.6        1791.9       1.0X
   Aggregate with two level aggregate true              58             60       
    3          1.1         880.0       2.0X
   
   Running benchmark: HashMap size : 262144
     Running case: Aggregate with two level aggregate false
     Stopped after 2 iterations, 339 ms
     Running case: Aggregate with two level aggregate true
     Stopped after 2 iterations, 270 ms
   
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.16
   Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
   HashMap size : 262144:                    Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Aggregate with two level aggregate false            169            170       
    1          1.6         644.7       1.0X
   Aggregate with two level aggregate true             134            135       
    2          2.0         510.3       1.3X
   
   Running benchmark: HashMap size : 1048576
     Running case: Aggregate with two level aggregate false
     Stopped after 2 iterations, 1353 ms
     Running case: Aggregate with two level aggregate true
     Stopped after 2 iterations, 1771 ms
   
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.16
   Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
   HashMap size : 1048576:                   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Aggregate with two level aggregate false            672            677       
    6          1.6         641.2       1.0X
   Aggregate with two level aggregate true             749            886       
  193          1.4         714.2       0.9X
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to