Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/15983
@cloud-fan
First, if this behavior change is required, we need to document it. I think
this is not clear to external users when they do not realize the underlying
change since `spark.sql.hive.manageFilesourcePartitions` is set to true by
default.
Second, let us discuss the existing user interface when they creating data
source tables.
- **CREATE EXTERNAL data source TABLE**: not allowed.
- **CREATE MANAGED non-partitioned data source table without specifying the
path in option**: an empty table
- **CREATE MANAGED non-partitioned data source table with specifying the
path in option**: the data in the path is visible. No need to repair the table.
- **CREATE MANAGED partitioned data source table without specifying the
path in option**: an empty table
- **CREATE MANAGED partitioned data source table with specifying the path
in option**:
-- If `spark.sql.hive.manageFilesourcePartitions` is set to false, the
data in the path is visible.
-- If `spark.sql.hive.manageFilesourcePartitions` is set to true, the
data in the path is not visible. If they repair the table, they can see all the
partitions. Without the repair, if they append the new rows to that partition,
the existing/old rows in that partition will be visiable with the newly
inserted row.
- **CREATE MANAGED partitioned hive serde table with specifying the
LOCATION**: automatically converted to an **EXTERNAL** table in Spark 2.0+, but
allowed by Hive.
It is pretty complex to document/rememeber the above behaviors.
- Should we support `EXTERNAL` for data source tables? Like what we did for
Hive serde tables, convert it to `EXTERNAL` table when users specify the `path`
in `option`?
- If we decide to change the behaviors, should we follow Hive's MANAGED
table when users specify a location? Ignore the existing data in the specified
path/location?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]