Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21398#discussion_r190307041
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -230,11 +232,29 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
         // specify location for managed table. And in 
[[CreateDataSourceTableAsSelectCommand]] we have
         // to create the table directory and write out data before we create 
this table, to avoid
         // exposing a partial written table.
    -    val needDefaultTableLocation = tableDefinition.tableType == MANAGED &&
    -      tableDefinition.storage.locationUri.isEmpty
    -
    -    val tableLocation = if (needDefaultTableLocation) {
    -      
Some(CatalogUtils.stringToURI(defaultTablePath(tableDefinition.identifier)))
    +    //
    +    // When using a remote metastore, and if a managed table is being 
created with its
    +    // location explicitly set to the location where it would be created 
anyway, then do
    +    // not set its location explicitly. This avoids an issue with Sentry 
in secure clusters.
    +    // Otherwise, the above comment applies.
    --- End diff --
    
    We have comments saying
    ```
    // We can't leave `locationUri` empty and count on Hive metastore to set a 
default table
    // location, because Hive metastore uses hive.metastore.warehouse.dir to 
generate default
    // table location for tables in default database, while we expect to use 
the location of
    // default database.
    ```
    
    In general I think it's OK to always set the location for managed tables, 
and I feel a little weird if some systems forbid it. Hive supports setting a 
location for managed tables, how shall we handle that?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to