LuciferYang opened a new pull request #29529: URL: https://github.com/apache/spark/pull/29529
### What changes were proposed in this pull request? This pr just revert SPARK-32550 for performance. ### Why are the changes needed? I found that it had some negative impact on performance, the typical cases is "deterministic cardinality estimation" in `HyperLogLogPlusPlusSuite` when rsd is 0.001, we found the code that is significantly slower is line 41 in `HyperLogLogPlusPlusSuite`: `new SpecificInternalRow(hll.aggBufferAttributes.map(_.dataType)) ` https://github.com/apache/spark/blob/08b951b1cb58cea2c34703e43202fe7c84725c8a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlusSuite.scala#L40-L44 The size of "hll.aggBufferAttributes" in this case is 209716, the results of comparison before and after spark-32550 merged are as follows, The unit is ns: | After SPARK-32550 createBuffer | After SPARK-32550 end to end | Before SPARK-32550 createBuffer | Before SPARK-32550 end to end -- | -- | -- | -- | -- rsd 0.001, n 1000 | 52715513243 | 53004810687 | 195807999 | 773977677 rsd 0.001, n 5000 | 51881246165 | 52519358215 | 13689949 | 249974855 rsd 0.001, n 10000 | 52234282788 | 52374639172 | 14199071 | 183452846 rsd 0.001, n 50000 | 55503517122 | 55664035449 | 15219394 | 584477125 rsd 0.001, n 100000 | 51862662845 | 52116774177 | 19662834 | 166483678 rsd 0.001, n 500000 | 51619226715 | 52183189526 | 178048012 | 16681330 rsd 0.001, n 1000000 | 54861366981 | 54976399142 | 226178708 | 18826340 rsd 0.001, n 5000000 | 52023602143 | 52354615149 | 388173579 | 15446409 rsd 0.001, n 10000000 | 53008591660 | 53601392304 | 533454460 | 16033032 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? `mvn test -pl sql/catalyst -DwildcardSuites=org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlusSuite -Dtest=none` **Before**: 8 m 18 s 320 ms **After**: 6s 278ms ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
