Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/16424#discussion_r94170967 --- Diff: docs/sql-programming-guide.md --- @@ -526,11 +526,18 @@ By default `saveAsTable` will create a "managed table", meaning that the locatio be controlled by the metastore. Managed tables will also have their data deleted automatically when a table is dropped. -Currently, `saveAsTable` does not expose an API supporting the creation of an "External table" from a `DataFrame`, -however, this functionality can be achieved by providing a `path` option to the `DataFrameWriter` with `path` as the key -and location of the external table as its value (String) when saving the table with `saveAsTable`. When an External table +Currently, `saveAsTable` does not expose an API supporting the creation of an "external table" from a `DataFrame`, +however. This functionality can be achieved by providing a `path` option to the `DataFrameWriter` with `path` as the key +and location of the external table as its value (a string) when saving the table with `saveAsTable`. When an External table is dropped only its metadata is removed. +Starting from Spark 2.1, persistent datasource tables have per-partition metadata stored in the Hive metastore. This brings several benefits: + +- Since full information of all partitions can be retrieved from metastore, excessive partition discovery is no longer needed. This greatly saves query planning time for partitioned tables with a large number of partitions. +- Hive DDLs such as `ALTER TABLE PARTITION ... SET LOCATION` are now available for tables created with the Datasource API. + +Note that partition information is not gathered by default when creating an external datasource tables (those with a `path` option). You may want to invoke `MSCK REPAIR TABLE` to trigger partition discovery and persist per-partition information into metastore before querying a created external table. --- End diff -- s/an external/external To sync the partition information in the metastore, you can invoke `MSCK REPAIR TABLE`.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org