Github user tejasapatil commented on the issue:

    https://github.com/apache/spark/pull/16938
  
    From what I understand, this change is applicable for EXTERNAL tables only.
    
    There are two main uses of EXTERNAL tables I am aware of (repost from 
https://github.com/apache/spark/pull/16868#issuecomment-279282420):
    - Ingest data from non-hive locations into Hive tables.
    - Create a logical "pointer" to an existing hive table / partition (without 
creating multiple copies of the underlying data).
    
    Ability to point to random location (which already has data) and create an 
EXTERNAL table over it is important for supporting EXTERNAL tables. If we don't 
allow this PR, then the options left to users are:
    - Create an external table and point to some non-existing location.
    - Later do either of these 2 things:
      - issue `ALTER TABLE SET LOCATION` to set the external table's location 
to the source location having desired data.
      - do a `dfs -mv` from the source location of the data to the new location 
which the table points at. This will be nasty in case your source data was a 
managed table location.
    
    @cloud-fan : I don't think Spark's interpretation of EXTERNAL tables is 
different from Hive's. If it is, can you share the differences ? I think we 
should allow this. If you have specific concerns, lets discuss those.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to