[GitHub] [spark] karuppayya commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

GitBox Mon, 17 Aug 2020 12:11:41 -0700


karuppayya commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-675060833



   @cloud-fan 
   We observed this behaviour(partial aggregation not helping) in one of our 
customers.
   Initially, I had disabled the partial aggregation completely by making the 
Aggregate mode to `org.apache.spark.sql.catalyst.expressions.aggregate.Complete`
   But later found the Hive's optimization for handling this scenario.
   I have used the Hive's heuristic(with default for minRows of 100000 to be 
sampled) in this PR.
    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] karuppayya commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

Reply via email to