(spark) branch master updated: [SPARK-46122][SQL] Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

dongjoon Tue, 30 Apr 2024 01:45:59 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 9e8c4aa3f43a [SPARK-46122][SQL] Set 
`spark.sql.legacy.createHiveTableByDefault` to `false` by default
9e8c4aa3f43a is described below

commit 9e8c4aa3f43a3d99bff56cca319db623abc473ee
Author: Dongjoon Hyun <dh...@apple.com>
AuthorDate: Tue Apr 30 01:44:37 2024 -0700

    [SPARK-46122][SQL] Set `spark.sql.legacy.createHiveTableByDefault` to 
`false` by default
    
    ### What changes were proposed in this pull request?
    
    This PR aims to switch `spark.sql.legacy.createHiveTableByDefault` to 
`false` by default in order to move away from this legacy behavior from `Apache 
Spark 4.0.0` while the legacy functionality will be preserved during Apache 
Spark 4.x period by setting `spark.sql.legacy.createHiveTableByDefault=true`.
    
    ### Why are the changes needed?
    
    Historically, this behavior change was merged at `Apache Spark 3.0.0` 
activity in SPARK-30098 and reverted officially during the `3.0.0 RC` period.
    
    - 2019-12-06: #26736 (58be82a)
    - 2019-12-06: 
https://lists.apache.org/thread/g90dz1og1zt4rr5h091rn1zqo50y759j
    - 2020-05-16: #28517
    
    At `Apache Spark 3.1.0`, we had another discussion and defined it as 
`Legacy` behavior via a new configuration by reusing the JIRA ID, SPARK-30098.
    - 2020-12-01: 
https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204
    - 2020-12-03: #30554
    
    Last year, this was proposed again twice and `Apache Spark 4.0.0` is a good 
time to make a decision for Apache Spark future direction.
    - SPARK-42603 on 2023-02-27 as an independent idea.
    - SPARK-46122 on 2023-11-27 as a part of Apache Spark 4.0.0 idea
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, the migration document is updated.
    
    ### How was this patch tested?
    
    Pass the CIs with the adjusted test cases.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #46207 from dongjoon-hyun/SPARK-46122.
    
    Authored-by: Dongjoon Hyun <dh...@apple.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 docs/sql-migration-guide.md                                       | 1 +
 python/pyspark/sql/tests/test_readwriter.py                       | 5 ++---
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala    | 2 +-
 .../apache/spark/sql/execution/command/PlanResolutionSuite.scala  | 8 +++-----
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 1e0fdadde1e3..07562babc87d 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -25,6 +25,7 @@ license: |
 ## Upgrading from Spark SQL 3.5 to 4.0
 
 - Since Spark 4.0, `spark.sql.ansi.enabled` is on by default. To restore the 
previous behavior, set `spark.sql.ansi.enabled` to `false` or 
`SPARK_ANSI_SQL_MODE` to `false`.
+- Since Spark 4.0, `CREATE TABLE` syntax without `USING` and `STORED AS` will 
use the value of `spark.sql.sources.default` as the table provider instead of 
`Hive`. To restore the previous behavior, set 
`spark.sql.legacy.createHiveTableByDefault` to `true`.
 - Since Spark 4.0, the default behaviour when inserting elements in a map is 
changed to first normalize keys -0.0 to 0.0. The affected SQL functions are 
`create_map`, `map_from_arrays`, `map_from_entries`, and `map_concat`. To 
restore the previous behaviour, set 
`spark.sql.legacy.disableMapKeyNormalization` to `true`.
 - Since Spark 4.0, the default value of `spark.sql.maxSinglePartitionBytes` is 
changed from `Long.MaxValue` to `128m`. To restore the previous behavior, set 
`spark.sql.maxSinglePartitionBytes` to `9223372036854775807`(`Long.MaxValue`).
 - Since Spark 4.0, any read of SQL tables takes into consideration the SQL 
configs 
`spark.sql.files.ignoreCorruptFiles`/`spark.sql.files.ignoreMissingFiles` 
instead of the core config 
`spark.files.ignoreCorruptFiles`/`spark.files.ignoreMissingFiles`.
diff --git a/python/pyspark/sql/tests/test_readwriter.py 
b/python/pyspark/sql/tests/test_readwriter.py
index 5784d2c72973..e752856d0316 100644
--- a/python/pyspark/sql/tests/test_readwriter.py
+++ b/python/pyspark/sql/tests/test_readwriter.py
@@ -247,10 +247,9 @@ class ReadwriterV2TestsMixin:
 
     def test_create_without_provider(self):
         df = self.df
-        with self.assertRaisesRegex(
-            AnalysisException, "NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT"
-        ):
+        with self.table("test_table"):
             df.writeTo("test_table").create()
+            self.assertEqual(100, self.spark.sql("select * from 
test_table").count())
 
     def test_table_overwrite(self):
         df = self.df
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index ac4a4ef90d0d..df4675865337 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -4457,7 +4457,7 @@ object SQLConf {
         s"instead of the value of ${DEFAULT_DATA_SOURCE_NAME.key} as the table 
provider.")
       .version("3.1.0")
       .booleanConf
-      .createWithDefault(true)
+      .createWithDefault(false)
 
   val LEGACY_CHAR_VARCHAR_AS_STRING =
     buildConf("spark.sql.legacy.charVarcharAsString")
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala
index 60f86ede7279..f004ab7137f7 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala
@@ -2847,11 +2847,9 @@ class PlanResolutionSuite extends AnalysisTest {
     assert(desc.viewText.isEmpty)
     assert(desc.viewQueryColumnNames.isEmpty)
     assert(desc.storage.locationUri.isEmpty)
-    assert(desc.storage.inputFormat ==
-        Some("org.apache.hadoop.mapred.TextInputFormat"))
-    assert(desc.storage.outputFormat ==
-        Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"))
-    assert(desc.storage.serde == 
Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
+    assert(desc.storage.inputFormat.isEmpty)
+    assert(desc.storage.outputFormat.isEmpty)
+    assert(desc.storage.serde.isEmpty)
     assert(desc.storage.properties.isEmpty)
     assert(desc.properties.isEmpty)
     assert(desc.comment.isEmpty)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-46122][SQL] Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

Reply via email to