karuppayya commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-675060833
@cloud-fan
We observed this behaviour(partial aggregation not helping) in one of our
customers.
Initially, I had disabled the partial aggregation completely by making the
Aggregate mode to `org.apache.spark.sql.catalyst.expressions.aggregate.Complete`
But later found the Hive's optimization for handling this scenario.
I have used the Hive's heuristic(with default for minRows of 100000 to be
sampled) in this PR.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]