Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18849#discussion_r132250345
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -1175,6 +1205,27 @@ private[spark] class HiveExternalCatalog(conf:
SparkConf, hadoopConf: Configurat
client.listFunctions(db, pattern)
}
+ /** Detect whether a table is stored with Hive-compatible metadata. */
+ private def isHiveCompatible(table: CatalogTable): Boolean = {
+ val provider =
table.provider.orElse(table.properties.get(DATASOURCE_PROVIDER))
+ if (provider.isDefined && provider != Some(DDLUtils.HIVE_PROVIDER)) {
+ table.properties.get(DATASOURCE_HIVE_COMPATIBLE) match {
+ case Some(value) =>
+ value.toBoolean
+ case _ =>
+ // If the property is not set, the table may have been created
by an old version
+ // of Spark. Detect Hive compatibility by comparing the table's
serde with the
+ // serde for the table's data source. If they match, the table
is Hive-compatible.
+ // If they don't, they're not, because of some other table
property that made it
+ // not initially Hive-compatible.
+ HiveSerDe.sourceToSerDe(provider.get) == table.storage.serde
--- End diff --
Case-sensitive tables are weird. They're a session configuration, but IMO
that config should affect compatibility, because even if you create a table
that is Hive compatible initially, you could modify it later so that it's not
Hive compatible anymore. Seems like the 1.2 Hive libraries would allow the
broken metadata, while the 2.1 libraries complain about it.
So yes, currently when case-sensitivity is enabled you still create tables
that may be Hive-compatible, and this change forces those tables to not be
Hive-compatible.
As for existing tables, there's no way to know, because that data is not
present anywhere in the table's metadata. (It's not after my change either, so
basically you can read that table with a case-insensitive session and who knows
what might happen.)
I'm ok with reverting this part since it's all a little hazy, but just
wanted to point out that it's a kinda weird part of the code.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]