[GitHub] spark pull request #19552: [SPARK-22329][SQL] Use NEVER_INFER for `spark.sql...

budde Mon, 23 Oct 2017 16:07:27 -0700

Github user budde commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19552#discussion_r146416338
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -388,7 +388,7 @@ object SQLConf {
         .stringConf
         .transform(_.toUpperCase(Locale.ROOT))
         .checkValues(HiveCaseSensitiveInferenceMode.values.map(_.toString))
    -    
.createWithDefault(HiveCaseSensitiveInferenceMode.INFER_AND_SAVE.toString)
    +    .createWithDefault(HiveCaseSensitiveInferenceMode.NEVER_INFER.toString)
    --- End diff --
    
    ```INFER_AND_SAVE``` was introduced to fix the issues presented in 
[SPARK-19611](https://issues.apache.org/jira/browse/SPARK-19611) that broke any 
table without the Spark-embedded table schema. This would break any table not 
created with Spark 2.0 or above, so it included tables created by older 
versions of Spark SQL (this was the situation we ran in to).
    
    Some issues with how this affects other Hive table properties were 
uncovered in [SPARK-22306](https://issues.apache.org/jira/browse/SPARK-22306). 
These problems are resolved by falling back to the previous default of 
```NEVER_INFER``` that was used prior to Spark 2.2.0. This will mean that out 
of the box Spark still won't be compatible with Hive tables backed by 
case-sensitive data files that weren't created by Spark SQL 2.0 or above but 
will avoid mangling existing Hive table properties. This is meant as a short 
term fix until I can go back and debug/resolve the conflicts that are occurring.
    
    I think these issues have highlighted how brittle of an approach that 
relying on Spark-specific Hive table properties is, especially since it's 
impossible to predict how other frameworks will utilize the table properties 
themselves, but I don't think there's any better way of doing this and we may 
just have to deal with conflicts like this as they arise.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19552: [SPARK-22329][SQL] Use NEVER_INFER for `spark.sql...

Reply via email to