[GitHub] [spark] HeartSaVioR commented on pull request #35552: [SPARK-38237][SQL][SS] Introduce a new config to require all cluster keys on Aggregate

GitBox Sat, 19 Feb 2022 18:55:08 -0800


HeartSaVioR commented on pull request #35552:
URL: https://github.com/apache/spark/pull/35552#issuecomment-1046148556



   > I feel introducing more manually tuned config to allow user to disable 
partial aggregate, is not working at scale. I am actually in favor of query 
engine to adaptively make optimization under the hood, instead of leaving users 
to tune. I feel a better approach is to adaptively disable partial aggregate 
during runtime if reduction ratio is low - 
https://github.com/apache/spark/pull/28804#issuecomment-854089520 .
   
   The config is basically assuming the case the query engine is not able to 
handle it smart. If users find out the output partitioning before aggregation 
has skews and it has to aggregate, it is pretty clear that partial aggregate 
does not help. I even doubt we have to be adaptive for this case, unless the 
condition of being adaptive can be determined without requiring actual 
execution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on pull request #35552: [SPARK-38237][SQL][SS] Introduce a new config to require all cluster keys on Aggregate

Reply via email to