Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/16938 From what I understand, this change is applicable for EXTERNAL tables only. There are two main uses of EXTERNAL tables I am aware of (repost from https://github.com/apache/spark/pull/16868#issuecomment-279282420): - Ingest data from non-hive locations into Hive tables. - Create a logical "pointer" to an existing hive table / partition (without creating multiple copies of the underlying data). Ability to point to random location (which already has data) and create an EXTERNAL table over it is important for supporting EXTERNAL tables. If we don't allow this PR, then the options left to users are: - Create an external table and point to some non-existing location. - Later do either of these 2 things: - issue `ALTER TABLE SET LOCATION` to set the external table's location to the source location having desired data. - do a `dfs -mv` from the source location of the data to the new location which the table points at. This will be nasty in case your source data was a managed table location. @cloud-fan : I don't think Spark's interpretation of EXTERNAL tables is different from Hive's. If it is, can you share the differences ? I think we should allow this. If you have specific concerns, lets discuss those.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org