[GitHub] [spark] kazuyukitanimura opened a new pull request, #36067: [SPARK-38573][SQL] Support Auto Partition Level Statistics Collection

GitBox Mon, 04 Apr 2022 16:59:54 -0700


kazuyukitanimura opened a new pull request, #36067:
URL: https://github.com/apache/spark/pull/36067


   ### What changes were proposed in this pull request?
   Currently https://issues.apache.org/jira/browse/SPARK-21127 supports storing 
the aggregated stats automatically at table level with the config 
`spark.sql.statistics.size.autoUpdate.enabled`.
   
   This PR proposes to update partition statistics automatically at the same 
time when the `spark.sql.statistics.size.autoUpdate.enabled` config is enabled.
   
   ### Why are the changes needed?
   Supporting partition level stats are useful to know which partitions are 
outliers (skewed partition) and query optimizer works better with partition 
level stats in case of partition pruning.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Updated unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] kazuyukitanimura opened a new pull request, #36067: [SPARK-38573][SQL] Support Auto Partition Level Statistics Collection

Reply via email to