Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14750#discussion_r78114385
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -446,29 +449,46 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
           table
         } else {
           getProviderFromTableProperties(table).map { provider =>
    -        assert(provider != "hive", "Hive serde table should not save 
provider in table properties.")
    -        // SPARK-15269: Persisted data source tables always store the 
location URI as a storage
    -        // property named "path" instead of standard Hive `dataLocation`, 
because Hive only
    -        // allows directory paths as location URIs while Spark SQL data 
source tables also
    -        // allows file paths. So the standard Hive `dataLocation` is 
meaningless for Spark SQL
    -        // data source tables.
    -        // Spark SQL may also save external data source in Hive compatible 
format when
    -        // possible, so that these tables can be directly accessed by 
Hive. For these tables,
    -        // `dataLocation` is still necessary. Here we also check for input 
format because only
    -        // these Hive compatible tables set this field.
    -        val storage = if (table.tableType == EXTERNAL && 
table.storage.inputFormat.isEmpty) {
    -          table.storage.copy(locationUri = None)
    +        if (provider == "hive") {
    +          val schemaFromTableProps = getSchemaFromTableProperties(table)
    +          if 
(DataType.equalsIgnoreCaseAndNullability(schemaFromTableProps, table.schema)) {
    --- End diff --
    
    Yeah, but Hive allows users to change physical storage by issuing `ALTER 
TABLE` DDL. 
    ```
    ALTER TABLE table_name CLUSTERED BY (col_name, col_name, ...) [SORTED BY 
(col_name, ...)]
      INTO num_buckets BUCKETS;
    ```
    
    That means, even if they are part of schema, but the `bucketSpect` could be 
different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to