Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13270#discussion_r65271411
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
    @@ -368,14 +371,27 @@ private[hive] class HiveClientImpl(
             createTime = h.getTTable.getCreateTime.toLong * 1000,
             lastAccessTime = h.getLastAccessTime.toLong * 1000,
             storage = CatalogStorageFormat(
    -          locationUri = shim.getDataLocation(h),
    +          locationUri = shim.getDataLocation(h).filterNot { _ =>
    +            // SPARK-15269: Persisted data source tables always store the 
location URI as a SerDe
    +            // property named "path" instead of standard Hive 
`dataLocation`, because Hive only
    +            // allows directory paths as location URIs while Spark SQL 
data source tables also
    +            // allows file paths. So the standard Hive `dataLocation` is 
meaningless for Spark SQL
    +            // data source tables.
    +            DDLUtils.isDatasourceTable(properties) &&
    +              h.getTableType == HiveTableType.EXTERNAL_TABLE &&
    +              // Spark SQL may also save external data source in Hive 
compatible format when
    +              // possible, so that these tables can be directly accessed 
by Hive. For these tables,
    +              // `dataLocation` is still necessary. Here we also check for 
input format class
    +              // because only these Hive compatible tables set this field.
    +              h.getInputFormatClass == null
    +          },
    --- End diff --
    
    Because we have to store the placeholder location URI into metastore for 
external data source tables, and I'd like to avoid exposing it to user space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to