wangyum opened a new pull request #24596: [SPARK-27694][SQL] CTAS created data 
source table should update statistics if 
spark.sql.statistics.size.autoUpdate.enabled is enabled
URL: https://github.com/apache/spark/pull/24596
 
 
   ## What changes were proposed in this pull request?
   
   How to reproduce:
   ```sql
   bin/spark-sql --conf spark.sql.statistics.size.autoUpdate.enabled=true -S
   
   spark-sql> CREATE TABLE spark_27694 USING parquet AS SELECT 'a', 'b';
   spark-sql> desc formatted spark_27694;
   a    string  NULL
   b    string  NULL
   
   # Detailed Table Information
   Database     default
   Table        spark_27694
   Owner        yumwang
   Created Time Tue May 14 10:38:25 CST 2019
   Last Access  Thu Jan 01 08:00:00 CST 1970
   Created By   Spark 2.4.0
   Type MANAGED
   Provider     parquet
   Table Properties     [transient_lastDdlTime=1557801505]
   Location     file:/user/hive/warehouse/spark_27694
   Serde Library        
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
   InputFormat  org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
   OutputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
   Storage Properties   [serialization.format=1]
   ```
   This pr fix this issue.
   
   ## How was this patch tested?
   
   unit tests and manual tests:
   ```
   bin/spark-sql --conf spark.sql.statistics.size.autoUpdate.enabled=true -S
   
   spark-sql> CREATE TABLE spark_27694 USING parquet AS SELECT 'a', 'b';
   spark-sql> DESC FORMATTED spark_27694;
   a    string  NULL
   b    string  NULL
   
   # Detailed Table Information
   Database     default
   Table        spark_27694
   Owner        root
   Created Time Mon May 13 19:45:33 GMT-07:00 2019
   Last Access  Wed Dec 31 17:00:00 GMT-07:00 1969
   Created By   Spark 3.0.0-SNAPSHOT
   Type MANAGED
   Provider     parquet
   Statistics   561 bytes
   Location     file:/user/hive/warehouse/spark_27694
   Serde Library        
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
   InputFormat  org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
   OutputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to