liupc commented on pull request #27968:
URL: https://github.com/apache/spark/pull/27968#issuecomment-628498669


   > I understand the size estimation of AppendOnlyMap can be not acute, but 
the proposed approach may lead to performance regression. Do you have any 
benchmark result regarding this change?
   
   There are slightly performance regression, but I think it's due to the 
samples are chosen differently.
   
   Here are some benchmark results for the test case in the UT(200000 records 
inserted into the AppendOnlyMap)
   
   ```
   Running benchmark: Perf of estimating AppendOnlyMap
     Running case: sample AppendOnlyMap
     Stopped after 1000 iterations, 115 ms
   
   OpenJDK 64-Bit Server VM 1.8.0_202-b08 on Linux 4.15.0-99-generic
   Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
   Perf of estimating AppendOnlyMap:         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   sample AppendOnlyMap                                  0              0       
    0       1805.1           0.6       1.0X
   ```
   
   This is benchmark results without this change.
   ```
   Running benchmark: Perf of estimating AppendOnlyMap
     Running case: sample AppendOnlyMap
     Stopped after 1000 iterations, 24 ms
   
   OpenJDK 64-Bit Server VM 1.8.0_202-b08 on Linux 4.15.0-99-generic
   Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
   Perf of estimating AppendOnlyMap:         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   sample AppendOnlyMap                                  0              0       
    0       8600.3           0.1       1.0X
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to