Github user ericl commented on a diff in the pull request:
https://github.com/apache/spark/pull/16424#discussion_r94170967
--- Diff: docs/sql-programming-guide.md ---
@@ -526,11 +526,18 @@ By default `saveAsTable` will create a "managed
table", meaning that the locatio
be controlled by the metastore. Managed tables will also have their data
deleted automatically
when a table is dropped.
-Currently, `saveAsTable` does not expose an API supporting the creation of
an "External table" from a `DataFrame`,
-however, this functionality can be achieved by providing a `path` option
to the `DataFrameWriter` with `path` as the key
-and location of the external table as its value (String) when saving the
table with `saveAsTable`. When an External table
+Currently, `saveAsTable` does not expose an API supporting the creation of
an "external table" from a `DataFrame`,
+however. This functionality can be achieved by providing a `path` option
to the `DataFrameWriter` with `path` as the key
+and location of the external table as its value (a string) when saving the
table with `saveAsTable`. When an External table
is dropped only its metadata is removed.
+Starting from Spark 2.1, persistent datasource tables have per-partition
metadata stored in the Hive metastore. This brings several benefits:
+
+- Since full information of all partitions can be retrieved from
metastore, excessive partition discovery is no longer needed. This greatly
saves query planning time for partitioned tables with a large number of
partitions.
+- Hive DDLs such as `ALTER TABLE PARTITION ... SET LOCATION` are now
available for tables created with the Datasource API.
+
+Note that partition information is not gathered by default when creating
an external datasource tables (those with a `path` option). You may want to
invoke `MSCK REPAIR TABLE` to trigger partition discovery and persist
per-partition information into metastore before querying a created external
table.
--- End diff --
s/an external/external
To sync the partition information in the metastore, you can invoke `MSCK
REPAIR TABLE`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]