vinodkc opened a new pull request #32411:
URL: https://github.com/apache/spark/pull/32411
### What changes were proposed in this pull request?
CTAS with location clause acts as an insert overwrite. This can cause
problems when there subdirectories within a location directory.
This causes some users to accidentally wipe out directories with very
important data. We should not allow CTAS with location to a non-empty directory.
### Why are the changes needed?
Hive already handled this scenario: HIVE-11319
Steps to reproduce:
sql("""create external table `demo_CTAS`( `comment` string) PARTITIONED BY
(`col1` string, `col2` string) STORED AS parquet location
'/tmp/u1/demo_CTAS'""")
sql("""INSERT OVERWRITE TABLE demo_CTAS partition (col1='1',col2='1') VALUES
('abc')""")
sql("select* from demo_CTAS").show
sql("""create table ctas1 location '/tmp/u2/ctas1' as select * from
demo_CTAS""")
sql("select* from ctas1").show
sql("""create table ctas2 location '/tmp/u2' as select * from demo_CTAS""")
Before the fix: Both create table operations will succeed. But values in
table ctas1 will be replaced by ctas2 accidentally.
After the fix: `create table ctas2...` will throw AnalysisException
org.apache.spark.sql.AnalysisException: CREATE-TABLE-AS-SELECT cannot create
table with location to a non-empty directory /tmp/u2 .
### Does this PR introduce _any_ user-facing change?
Yes, if the location directory is not empty, CTAS with location will throw
AnalysisException
sql("""create table ctas2 location '/tmp/u2' as select * from demo_CTAS""")
org.apache.spark.sql.AnalysisException: CREATE-TABLE-AS-SELECT cannot create
table with location to a non-empty directory /tmp/u2 .
### How was this patch tested?
Test case added in SQLQuerySuite.scala
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]