[GitHub] [spark] dtenedor commented on a diff in pull request #37431: [SPARK-40001][SQL] Add config to make DEFAULT values in JSON tables mutually exclusive with SQLConf.JSON_GENERATOR_IGNORE_NULL_FIELDS

GitBox Thu, 11 Aug 2022 13:40:04 -0700


dtenedor commented on code in PR #37431:
URL: https://github.com/apache/spark/pull/37431#discussion_r943921356



##########
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala:
##########
@@ -1657,6 +1656,28 @@ class InsertSuite extends DataSourceTest with 
SharedSparkSession {
     }
   }
 
+  test("SPARK-40001 JSON DEFAULT columns require 
JSON_GENERATOR_IGNORE_NULL_FIELDS off") {
+    val error = "DEFAULT values are not supported for JSON tables"
+    withTable("t") {
+      assert(intercept[AnalysisException] {
+        sql("create table t (a int default 42) using json")

Review Comment:
   Thanks for the useful comment! This made me realize that other writers may 
create JSON with missing NULL values in the storage as well. I fixed this by:
   
   1) Reverting the analyzer changes in this PR
   2) Changed the new config to 
`DEFAULT_COLUMN_JSON_GENERATOR_FORCE_NULL_FIELDS` which overrides any other 
settings for target columns with DEFAULT values to always write explicit NULLs 
to storage.
   3) Updated the `DEFAULT_COLUMN_ALLOWED_PROVIDERS` config to ban `ALTER TABLE 
ADD COLUMN` commands with `DEFAULT` values for JSON tables, with a descriptive 
error message.
   
   This ensures correctness with JSON `DEFAULT` columns by always ensuring that 
new rows with NULL values get explicit NULLs written to the JSON storage, so 
that subsequent scans can tell the difference between those NULLs and any 
written `DEFAULT` values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dtenedor commented on a diff in pull request #37431: [SPARK-40001][SQL] Add config to make DEFAULT values in JSON tables mutually exclusive with SQLConf.JSON_GENERATOR_IGNORE_NULL_FIELDS

Reply via email to