Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21398#discussion_r190307041
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -230,11 +232,29 @@ private[spark] class HiveExternalCatalog(conf:
SparkConf, hadoopConf: Configurat
// specify location for managed table. And in
[[CreateDataSourceTableAsSelectCommand]] we have
// to create the table directory and write out data before we create
this table, to avoid
// exposing a partial written table.
- val needDefaultTableLocation = tableDefinition.tableType == MANAGED &&
- tableDefinition.storage.locationUri.isEmpty
-
- val tableLocation = if (needDefaultTableLocation) {
-
Some(CatalogUtils.stringToURI(defaultTablePath(tableDefinition.identifier)))
+ //
+ // When using a remote metastore, and if a managed table is being
created with its
+ // location explicitly set to the location where it would be created
anyway, then do
+ // not set its location explicitly. This avoids an issue with Sentry
in secure clusters.
+ // Otherwise, the above comment applies.
--- End diff --
We have comments saying
```
// We can't leave `locationUri` empty and count on Hive metastore to set a
default table
// location, because Hive metastore uses hive.metastore.warehouse.dir to
generate default
// table location for tables in default database, while we expect to use
the location of
// default database.
```
In general I think it's OK to always set the location for managed tables,
and I feel a little weird if some systems forbid it. Hive supports setting a
location for managed tables, how shall we handle that?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]