[GitHub] spark issue #15983: [SPARK-18544] [SQL] Append with df.saveAsTable writes da...

gatorsmile Sun, 18 Dec 2016 10:09:32 -0800

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/15983
  
    @cloud-fan 
    
    First, if this behavior change is required, we need to document it. I think 
this is not clear to external users when they do not realize the underlying 
change since `spark.sql.hive.manageFilesourcePartitions` is set to true by 
default. 
    
    Second, let us discuss the existing user interface when they creating data 
source tables. 
    - **CREATE EXTERNAL  data source TABLE**: not allowed. 
    - **CREATE MANAGED non-partitioned data source table without specifying the 
path in option**: an empty table
    - **CREATE MANAGED non-partitioned data source table with specifying the 
path in option**: the data in the path is visible. No need to repair the table.
    - **CREATE MANAGED partitioned data source table without specifying the 
path in option**: an empty table
    - **CREATE MANAGED partitioned data source table with specifying the path 
in option**: 
      -- If `spark.sql.hive.manageFilesourcePartitions` is set to false, the 
data in the path is visible. 
      -- If `spark.sql.hive.manageFilesourcePartitions` is set to true, the 
data in the path is not visible. If they repair the table, they can see all the 
partitions. Without the repair, if they append the new rows to that partition, 
the existing/old rows in that partition will be visiable with the newly 
inserted row.
    - **CREATE MANAGED partitioned hive serde table with specifying the 
LOCATION**: automatically converted to an **EXTERNAL** table in Spark 2.0+, but 
allowed by Hive. 
    
    It is pretty complex to document/rememeber the above behaviors.
    - Should we support `EXTERNAL` for data source tables? Like what we did for 
Hive serde tables, convert it to `EXTERNAL` table when users specify the `path` 
in `option`?
    - If we decide to change the behaviors, should we follow Hive's MANAGED 
table when users specify a location? Ignore the existing data in the specified 
path/location?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15983: [SPARK-18544] [SQL] Append with df.saveAsTable writes da...

Reply via email to