LuciferYang edited a comment on pull request #29366:
URL: https://github.com/apache/spark/pull/29366#issuecomment-679078717


   @srowen @HyukjinKwon @dongjoon-hyun @msamirkhan hi~ has anyone paid 
attention to the performance impact of this issue? I found that it had some 
negative impact on performance and create a new Jira 
[SPARK-32690](https://issues.apache.org/jira/browse/SPARK-32690) .
   
   the typical cases is "deterministic cardinality estimation" in 
   
   `HyperLogLogPlusPlusSuite` when rsd is 0.001, we found the code that is 
significantly slower is line 41 in `HyperLogLogPlusPlusSuite`
   
   
https://github.com/apache/spark/blob/08b951b1cb58cea2c34703e43202fe7c84725c8a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlusSuite.scala#L40-L44
   
   The size of "hll.aggBufferAttributes" in this case is 209716, the results of 
comparison before and after spark-32550 merged are as follows, The unit is ns:
   
   
     | After   SPARK-32550 create createBuffer | After   SPARK-32550 end to end 
| Before   SPARK-32550 create input | Before   SPARK-32550 end to end
   -- | -- | -- | -- | --
   rsd 0.001, n   1000 | 52715513243 | 53004810687 | 195807999 | 773977677
   rsd 0.001, n   5000 | 51881246165 | 52519358215 | 13689949 | 249974855
   rsd 0.001, n   10000 | 52234282788 | 52374639172 | 14199071 | 183452846
   rsd 0.001, n   50000 | 55503517122 | 55664035449 | 15219394 | 584477125
   rsd 0.001, n   100000 | 51862662845 | 52116774177 | 19662834 | 166483678
   rsd 0.001, n   500000 | 51619226715 | 52183189526 | 178048012 | 16681330
   rsd 0.001, n   1000000 | 54861366981 | 54976399142 | 226178708 | 18826340
   rsd 0.001, n   5000000 | 52023602143 | 52354615149 | 388173579 | 15446409
   rsd 0.001, n   10000000 | 53008591660 | 53601392304 | 533454460 | 16033032
   
   We can use `mvn test -pl sql/catalyst 
-DwildcardSuites=org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlusSuite
 -Dtest=none` to verify the result above
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to