Re: [PR] [SPARK-45731][SQL] Also update partition statistics with `ANALYZE TABLE` command [spark]

via GitHub Wed, 01 Nov 2023 16:59:56 -0700


sunchao commented on PR #43629:
URL: https://github.com/apache/spark/pull/43629#issuecomment-1789852270


   Thanks @dongjoon-hyun for the quick reply.
   
   > According to the title and first sentence of PR description, is this 
related to another JIRA
   
   Not really. The title means this PR proposes to in addition of updating 
table stats, also update partition stats with `ANALYZE TABLE` command.
   
   > Just a question. Why don't we use `REPAIR TABLE` before this?
   
   Hmm I think `REPAIR TABLE` serves a different purpose, and is used to 
recover partitions for an existing table that is created from a directory which 
contains sub-directories for partitions. On the other hand, `ANALYZE TABLE` can 
be used to update table & partition stats. For instance, a partition could 
already exist for a table, but its stats could be out-of sync, due to reasons 
such as data was written to the partition directory without going through 
Spark. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-45731][SQL] Also update partition statistics with `ANALYZE TABLE` command [spark]

Reply via email to