Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16290
If the default database has already been created in the metastore, any
following changes of `spark.sql.default.warehouse.dir` can trigger an issue
when we create a data source table in the default database (Here, we assume
Hive support is enabled). Note, we will not hit any issue if we create a Hive
serde table in the default database, or create a data source table in the
non-default database.
The directory of managed data source tables is created by Hive. When
creating a new data source table, the created directory is based on the current
value of `hive.metastore.warehouse.dir`. However, the value of table location
in the metastore is pointing to the child directory of the location of the
default database. Thus, you will not hit any issue when you creating such a
table. However, the mismatch will cause a problem (because the expected
directory does not exist), when we try to select from /insert into this table.
This is a bug of Hive metastore.
@dilipbiswal hit this issue very recently. Below shows the location of
these two tables.
`t11` is a Hive managed data source table we created in the default
database.
```
spark-sql> describe extended t11;
...
Storage(Location: file:/user/hive/warehouse/t11, InputFormat:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, OutputFormat:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, Serde:
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Properties:
[serialization.format=1]))
Time taken: 0.105 seconds, Fetched 8 row(s)
```
`t1` is a Hive managed data source table we created in the non-default
database.
```
spark-sql> use dilip;
Time taken: 0.028 seconds
spark-sql> describe extended t1;
...
Storage(Location:
file:/home/cloudera/mygit/apache/spark/bin/spark-warehouse/dilip.db/t1,
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat,
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat,
Serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Properties:
[serialization.format=1]))
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]