[GitHub] [spark] wankunde commented on pull request #41685: [SPARK-43876][SQL][FOLLOWUP] Add a unit test for fast hashmap for distinct queries

via GitHub Wed, 21 Jun 2023 19:30:25 -0700


wankunde commented on PR #41685:
URL: https://github.com/apache/spark/pull/41685#issuecomment-1601932057


   > For the 3rd case, it becomes slower?
   
   By default, `FAST_HASH_AGGREGATE_MAX_ROWS_CAPACITY_BIT = 16`, the fast hash 
map size is `1 << 16 = 65536`, and the result will be 2.0X faster. 
   
   For the 3rd case, the fast hash map size is `1 << 20 = 1048576`, I don't 
know why it's slowed down.  Maybe large hash map has more cache misses.
   
   This change will make the small queries faster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wankunde commented on pull request #41685: [SPARK-43876][SQL][FOLLOWUP] Add a unit test for fast hashmap for distinct queries

Reply via email to