[GitHub] spark issue #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehouse dir t...

gatorsmile Wed, 14 Dec 2016 22:17:09 -0800

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16290
  
    If the default database has already been created in the metastore, any 
following changes of `spark.sql.default.warehouse.dir` can trigger an issue 
when we create a data source table in the default database (Here, we assume 
Hive support is enabled). Note, we will not hit any issue if we create a Hive 
serde table in the default database, or create a data source table in the 
non-default database. 
    
    The directory of managed data source tables is created by Hive. When 
creating a new data source table, the created directory is based on the current 
value of `hive.metastore.warehouse.dir`. However, the value of table location 
in the metastore is pointing to the child directory of the location of the 
default database. Thus, you will not hit any issue when you creating such a 
table. However, the mismatch will cause a problem (because the expected 
directory does not exist), when we try to select from /insert into this table. 
This is a bug of Hive metastore. 
    
    @dilipbiswal hit this issue very recently. Below shows the location of 
these two tables. 
    
    `t11` is a Hive managed data source table we created in the default 
database. 
    ```
    spark-sql> describe extended t11;
    ...
        Storage(Location: file:/user/hive/warehouse/t11, InputFormat: 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, OutputFormat: 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, Serde: 
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Properties: 
[serialization.format=1]))    
    Time taken: 0.105 seconds, Fetched 8 row(s)
    ```  
    
    `t1` is a Hive managed data source table we created in the non-default 
database. 
    ```
    spark-sql> use dilip;
    Time taken: 0.028 seconds
    spark-sql> describe extended t1;
    ...
        Storage(Location: 
file:/home/cloudera/mygit/apache/spark/bin/spark-warehouse/dilip.db/t1, 
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, 
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, 
Serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Properties: 
[serialization.format=1]))    
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehouse dir t...

Reply via email to