vinodkc opened a new pull request #32411:
URL: https://github.com/apache/spark/pull/32411


   ### What changes were proposed in this pull request?
   
   CTAS with location clause acts as an insert overwrite. This can cause 
problems when there subdirectories within a location directory.
   This causes some users to accidentally wipe out directories with very 
important data. We should not allow CTAS with location to a non-empty directory.
   
   ### Why are the changes needed?
   
   Hive already handled this scenario: HIVE-11319
   
   Steps to reproduce:
   
   sql("""create external table  `demo_CTAS`( `comment` string) PARTITIONED BY 
(`col1` string, `col2` string) STORED AS parquet location 
'/tmp/u1/demo_CTAS'""")
   sql("""INSERT OVERWRITE TABLE demo_CTAS partition (col1='1',col2='1') VALUES 
('abc')""")
   sql("select* from demo_CTAS").show
   sql("""create table ctas1 location '/tmp/u2/ctas1' as select * from 
demo_CTAS""")
   sql("select* from ctas1").show
   sql("""create table ctas2 location '/tmp/u2' as select * from demo_CTAS""")
   
   Before the fix: Both create table operations will succeed. But values in 
table ctas1 will be replaced by ctas2 accidentally.
   
   After the fix: `create table ctas2...` will throw AnalysisException
   org.apache.spark.sql.AnalysisException: CREATE-TABLE-AS-SELECT cannot create 
table with location to a non-empty directory /tmp/u2 .
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, if the location directory is not empty, CTAS with location will throw 
AnalysisException
   
   sql("""create table ctas2 location '/tmp/u2' as select * from demo_CTAS""")
   org.apache.spark.sql.AnalysisException: CREATE-TABLE-AS-SELECT cannot create 
table with location to a non-empty directory /tmp/u2 .
   
   ### How was this patch tested?
   Test case added in SQLQuerySuite.scala
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to