Repository: spark Updated Branches: refs/heads/branch-2.1 ba505805d -> d99b49b11
[SPARK-20450][SQL] Unexpected first-query schema inference cost with 2.1.1 ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-19611 fixes a regression from 2.0 where Spark silently fails to read case-sensitive fields missing a case-sensitive schema in the table properties. The fix is to detect this situation, infer the schema, and write the case-sensitive schema into the metastore. However this can incur an unexpected performance hit the first time such a problematic table is queried (and there is a high false-positive rate here since most tables don't actually have case-sensitive fields). This PR changes the default to NEVER_INFER (same behavior as 2.1.0). In 2.2, we can consider leaving the default to INFER_AND_SAVE. ## How was this patch tested? Unit tests. Author: Eric Liang <e...@databricks.com> Closes #17749 from ericl/spark-20450. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d99b49b1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d99b49b1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d99b49b1 Branch: refs/heads/branch-2.1 Commit: d99b49b11a44ba13d126caf3e6e086f5b5b04827 Parents: ba50580 Author: Eric Liang <e...@databricks.com> Authored: Tue Apr 25 00:33:09 2017 +0200 Committer: Herman van Hovell <hvanhov...@databricks.com> Committed: Tue Apr 25 00:33:09 2017 +0200 ---------------------------------------------------------------------- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/d99b49b1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---------------------------------------------------------------------- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index ad5b103..5926bb0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -298,7 +298,7 @@ object SQLConf { .stringConf .transform(_.toUpperCase()) .checkValues(HiveCaseSensitiveInferenceMode.values.map(_.toString)) - .createWithDefault(HiveCaseSensitiveInferenceMode.INFER_AND_SAVE.toString) + .createWithDefault(HiveCaseSensitiveInferenceMode.NEVER_INFER.toString) val OPTIMIZER_METADATA_ONLY = SQLConfigBuilder("spark.sql.optimizer.metadataOnly") .doc("When true, enable the metadata-only query optimization that use the table's metadata " + --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org