[GitHub] spark pull request #18849: [SPARK-21617][SQL] Store correct table metadata w...

vanzin Wed, 09 Aug 2017 10:25:34 -0700

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18849#discussion_r132250345
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -1175,6 +1205,27 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
         client.listFunctions(db, pattern)
       }
     
    +  /** Detect whether a table is stored with Hive-compatible metadata. */
    +  private def isHiveCompatible(table: CatalogTable): Boolean = {
    +    val provider = 
table.provider.orElse(table.properties.get(DATASOURCE_PROVIDER))
    +    if (provider.isDefined && provider != Some(DDLUtils.HIVE_PROVIDER)) {
    +      table.properties.get(DATASOURCE_HIVE_COMPATIBLE) match {
    +        case Some(value) =>
    +          value.toBoolean
    +        case _ =>
    +          // If the property is not set, the table may have been created 
by an old version
    +          // of Spark. Detect Hive compatibility by comparing the table's 
serde with the
    +          // serde for the table's data source. If they match, the table 
is Hive-compatible.
    +          // If they don't, they're not, because of some other table 
property that made it
    +          // not initially Hive-compatible.
    +          HiveSerDe.sourceToSerDe(provider.get) == table.storage.serde
    --- End diff --
    
    Case-sensitive tables are weird. They're a session configuration, but IMO 
that config should affect compatibility, because even if you create a table 
that is Hive compatible initially, you could modify it later so that it's not 
Hive compatible anymore. Seems like the 1.2 Hive libraries would allow the 
broken metadata, while the 2.1 libraries complain about it.
    
    So yes, currently when case-sensitivity is enabled you still create tables 
that may be Hive-compatible, and this change forces those tables to not be 
Hive-compatible.
    
    As for existing tables, there's no way to know, because that data is not 
present anywhere in the table's metadata. (It's not after my change either, so 
basically you can read that table with a case-insensitive session and who knows 
what might happen.)
    
    I'm ok with reverting this part since it's all a little hazy, but just 
wanted to point out that it's a kinda weird part of the code.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18849: [SPARK-21617][SQL] Store correct table metadata w...

Reply via email to