Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16424#discussion_r94170967
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -526,11 +526,18 @@ By default `saveAsTable` will create a "managed 
table", meaning that the locatio
     be controlled by the metastore. Managed tables will also have their data 
deleted automatically
     when a table is dropped.
     
    -Currently, `saveAsTable` does not expose an API supporting the creation of 
an "External table" from a `DataFrame`, 
    -however, this functionality can be achieved by providing a `path` option 
to the `DataFrameWriter` with `path` as the key 
    -and location of the external table as its value (String) when saving the 
table with `saveAsTable`. When an External table 
    +Currently, `saveAsTable` does not expose an API supporting the creation of 
an "external table" from a `DataFrame`,
    +however. This functionality can be achieved by providing a `path` option 
to the `DataFrameWriter` with `path` as the key
    +and location of the external table as its value (a string) when saving the 
table with `saveAsTable`. When an External table
     is dropped only its metadata is removed.
     
    +Starting from Spark 2.1, persistent datasource tables have per-partition 
metadata stored in the Hive metastore. This brings several benefits:
    +
    +- Since full information of all partitions can be retrieved from 
metastore, excessive partition discovery is no longer needed. This greatly 
saves query planning time for partitioned tables with a large number of 
partitions.
    +- Hive DDLs such as `ALTER TABLE PARTITION ... SET LOCATION` are now 
available for tables created with the Datasource API.
    +
    +Note that partition information is not gathered by default when creating 
an external datasource tables (those with a `path` option). You may want to 
invoke `MSCK REPAIR TABLE` to trigger partition discovery and persist 
per-partition information into metastore before querying a created external 
table.
    --- End diff --
    
    s/an external/external
    
    To sync the partition information in the metastore, you can invoke `MSCK 
REPAIR TABLE`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to