spark git commit: [SPARK-20450][SQL] Unexpected first-query schema inference cost with 2.1.1

hvanhovell Mon, 24 Apr 2017 15:33:40 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 ba505805d -> d99b49b11



[SPARK-20450][SQL] Unexpected first-query schema inference cost with 2.1.1

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-19611 fixes a regression from 2.0 
where Spark silently fails to read case-sensitive fields missing a 
case-sensitive schema in the table properties. The fix is to detect this 
situation, infer the schema, and write the case-sensitive schema into the 
metastore.

However this can incur an unexpected performance hit the first time such a 
problematic table is queried (and there is a high false-positive rate here 
since most tables don't actually have case-sensitive fields).

This PR changes the default to NEVER_INFER (same behavior as 2.1.0). In 2.2, we 
can consider leaving the default to INFER_AND_SAVE.

## How was this patch tested?

Unit tests.

Author: Eric Liang <e...@databricks.com>

Closes #17749 from ericl/spark-20450.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d99b49b1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d99b49b1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d99b49b1

Branch: refs/heads/branch-2.1
Commit: d99b49b11a44ba13d126caf3e6e086f5b5b04827
Parents: ba50580
Author: Eric Liang <e...@databricks.com>
Authored: Tue Apr 25 00:33:09 2017 +0200
Committer: Herman van Hovell <hvanhov...@databricks.com>
Committed: Tue Apr 25 00:33:09 2017 +0200

----------------------------------------------------------------------
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala     | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/d99b49b1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
----------------------------------------------------------------------
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index ad5b103..5926bb0 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -298,7 +298,7 @@ object SQLConf {
     .stringConf
     .transform(_.toUpperCase())
     .checkValues(HiveCaseSensitiveInferenceMode.values.map(_.toString))
-    .createWithDefault(HiveCaseSensitiveInferenceMode.INFER_AND_SAVE.toString)
+    .createWithDefault(HiveCaseSensitiveInferenceMode.NEVER_INFER.toString)
 
   val OPTIMIZER_METADATA_ONLY = 
SQLConfigBuilder("spark.sql.optimizer.metadataOnly")
     .doc("When true, enable the metadata-only query optimization that use the 
table's metadata " +


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20450][SQL] Unexpected first-query schema inference cost with 2.1.1

Reply via email to