wangyum opened a new pull request #24596: [SPARK-27694][SQL] CTAS created data source table should update statistics if spark.sql.statistics.size.autoUpdate.enabled is enabled URL: https://github.com/apache/spark/pull/24596 ## What changes were proposed in this pull request? How to reproduce: ```sql bin/spark-sql --conf spark.sql.statistics.size.autoUpdate.enabled=true -S spark-sql> CREATE TABLE spark_27694 USING parquet AS SELECT 'a', 'b'; spark-sql> desc formatted spark_27694; a string NULL b string NULL # Detailed Table Information Database default Table spark_27694 Owner yumwang Created Time Tue May 14 10:38:25 CST 2019 Last Access Thu Jan 01 08:00:00 CST 1970 Created By Spark 2.4.0 Type MANAGED Provider parquet Table Properties [transient_lastDdlTime=1557801505] Location file:/user/hive/warehouse/spark_27694 Serde Library org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe InputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat OutputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat Storage Properties [serialization.format=1] ``` This pr fix this issue. ## How was this patch tested? unit tests and manual tests: ``` bin/spark-sql --conf spark.sql.statistics.size.autoUpdate.enabled=true -S spark-sql> CREATE TABLE spark_27694 USING parquet AS SELECT 'a', 'b'; spark-sql> DESC FORMATTED spark_27694; a string NULL b string NULL # Detailed Table Information Database default Table spark_27694 Owner root Created Time Mon May 13 19:45:33 GMT-07:00 2019 Last Access Wed Dec 31 17:00:00 GMT-07:00 1969 Created By Spark 3.0.0-SNAPSHOT Type MANAGED Provider parquet Statistics 561 bytes Location file:/user/hive/warehouse/spark_27694 Serde Library org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe InputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat OutputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
