Cheng Su created SPARK-35241:
--------------------------------

             Summary: Investigate to prefer vectorized hash map in hash 
aggregate selectively
                 Key: SPARK-35241
                 URL: https://issues.apache.org/jira/browse/SPARK-35241
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Cheng Su


In hash aggregate, we always use row-based hash map as first level hash map in 
production, and use vectorized hash map in testing / benchmarking. However we 
do find in micro-benchmark that vectorized hash map is better than row-based 
hash map e.g. with single key - 
[https://github.com/apache/spark/pull/32357#discussion_r620914345] . So we 
should re-evaluate the decision to always use row-based hash map or not. And 
maybe come up with a more adaptive decision policy to choose which map to use 
depending on keys / values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to