GitHub user dongjoon-hyun opened a pull request:
https://github.com/apache/spark/pull/19552
[SPARK-22329][SQL] Use NEVER_INFER for
`spark.sql.hive.caseSensitiveInferenceMode` by default
## What changes were proposed in this pull request?
In Spark 2.2.0, `spark.sql.hive.caseSensitiveInferenceMode` has a critical
issue by default.
- [SPARK-19611](https://issues.apache.org/jira/browse/SPARK-19611) uses
`INFER_AND_SAVE` at 2.2.0 since Spark 2.1.0 breaks some Hive tables backed by
case-sensitive data files.
> This situation will occur for any Hive table that wasn't created by
Spark or that was created prior to Spark 2.1.0. If a user attempts to run a
query over such a table containing a case-sensitive field name in the query
projection or in the query filter, the query will return 0 results in every
case.
- However, [SPARK-22306](https://issues.apache.org/jira/browse/SPARK-22306)
reports this also corrupts Hive Metastore schema by removing bucketing
information (BUCKETING_COLS, SORT_COLS) and changing owner. This is undesirable
side-effects. Hive Metastore is a shared resource and Spark should not corrupt
it by default.
- Since Spark 2.3.0 supports Bucketing, BUCKETING_COLS and SORT_COLS look
okay at least. However, we need to figure out the issue of changing owners.
Also, we cannot backport bucketing patch into `branch-2.2`. We need to verify
this option with more tests before releasing 2.3.0.
This PR proposes to recover that option back to `NEVER_INFO` like Spark
2.2.0 by default. Users can take a risk by enabling `INFER_AND_SAVE` by
themselves.
## How was this patch tested?
Pass the existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dongjoon-hyun/spark SPARK-22329
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19552.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19552
----
commit a256627dbc2772e69cd0f9f2aa43b384165e3657
Author: Dongjoon Hyun <[email protected]>
Date: 2017-10-22T17:59:15Z
[SPARK-22329][SQL] Use NEVER_INFER for
`spark.sql.hive.caseSensitiveInferenceMode` by default
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]