[GitHub] [spark] HeartSaVioR edited a comment on pull request #35552: [SPARK-38237][SQL][SS] Introduce a new config to require all cluster keys on Aggregate

GitBox Sat, 19 Feb 2022 18:56:13 -0800


HeartSaVioR edited a comment on pull request #35552:
URL: https://github.com/apache/spark/pull/35552#issuecomment-1046148556



   > I feel introducing more manually tuned config to allow user to disable 
partial aggregate, is not working at scale. I am actually in favor of query 
engine to adaptively make optimization under the hood, instead of leaving users 
to tune. I feel a better approach is to adaptively disable partial aggregate 
during runtime if reduction ratio is low - 
https://github.com/apache/spark/pull/28804#issuecomment-854089520 .
   
   The config is basically assuming the case the query engine is not able to 
handle it smart. If users find out the output partitioning before aggregation 
has skews and it has to aggregate, in most case it is pretty clear that partial 
aggregate does not help. I even doubt we have to be adaptive for this case, 
unless the condition of being adaptive can be determined without requiring 
actual execution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR edited a comment on pull request #35552: [SPARK-38237][SQL][SS] Introduce a new config to require all cluster keys on Aggregate

Reply via email to