HeartSaVioR edited a comment on pull request #35552: URL: https://github.com/apache/spark/pull/35552#issuecomment-1046148556
> I feel introducing more manually tuned config to allow user to disable partial aggregate, is not working at scale. I am actually in favor of query engine to adaptively make optimization under the hood, instead of leaving users to tune. I feel a better approach is to adaptively disable partial aggregate during runtime if reduction ratio is low - https://github.com/apache/spark/pull/28804#issuecomment-854089520 . The config is basically assuming the case the query engine is not able to handle it smart. If users find out the output partitioning before aggregation has skews and it has to aggregate, in most case it is pretty clear that partial aggregate does not help. I even doubt we have to be adaptive for this case, unless the condition of being adaptive can be determined without requiring actual execution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
