Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/19552#discussion_r146416338
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -388,7 +388,7 @@ object SQLConf {
.stringConf
.transform(_.toUpperCase(Locale.ROOT))
.checkValues(HiveCaseSensitiveInferenceMode.values.map(_.toString))
-
.createWithDefault(HiveCaseSensitiveInferenceMode.INFER_AND_SAVE.toString)
+ .createWithDefault(HiveCaseSensitiveInferenceMode.NEVER_INFER.toString)
--- End diff --
```INFER_AND_SAVE``` was introduced to fix the issues presented in
[SPARK-19611](https://issues.apache.org/jira/browse/SPARK-19611) that broke any
table without the Spark-embedded table schema. This would break any table not
created with Spark 2.0 or above, so it included tables created by older
versions of Spark SQL (this was the situation we ran in to).
Some issues with how this affects other Hive table properties were
uncovered in [SPARK-22306](https://issues.apache.org/jira/browse/SPARK-22306).
These problems are resolved by falling back to the previous default of
```NEVER_INFER``` that was used prior to Spark 2.2.0. This will mean that out
of the box Spark still won't be compatible with Hive tables backed by
case-sensitive data files that weren't created by Spark SQL 2.0 or above but
will avoid mangling existing Hive table properties. This is meant as a short
term fix until I can go back and debug/resolve the conflicts that are occurring.
I think these issues have highlighted how brittle of an approach that
relying on Spark-specific Hive table properties is, especially since it's
impossible to predict how other frameworks will utilize the table properties
themselves, but I don't think there's any better way of doing this and we may
just have to deal with conflicts like this as they arise.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]