dtenedor commented on code in PR #37431:
URL: https://github.com/apache/spark/pull/37431#discussion_r943921356


##########
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala:
##########
@@ -1657,6 +1656,28 @@ class InsertSuite extends DataSourceTest with 
SharedSparkSession {
     }
   }
 
+  test("SPARK-40001 JSON DEFAULT columns require 
JSON_GENERATOR_IGNORE_NULL_FIELDS off") {
+    val error = "DEFAULT values are not supported for JSON tables"
+    withTable("t") {
+      assert(intercept[AnalysisException] {
+        sql("create table t (a int default 42) using json")

Review Comment:
   Thanks for the useful comment! This made me realize that other writers may 
create JSON with missing NULL values in the storage as well. I fixed this by:
   
   1) Reverting the analyzer changes in this PR
   2) Changed the new config to 
`DEFAULT_COLUMN_JSON_GENERATOR_FORCE_NULL_FIELDS` which overrides any other 
settings for target columns with DEFAULT values to always write explicit NULLs 
to storage.
   3) Updated the `DEFAULT_COLUMN_ALLOWED_PROVIDERS` config to ban `ALTER TABLE 
ADD COLUMN` commands with `DEFAULT` values for JSON tables, with a descriptive 
error message.
   
   This ensures correctness with JSON `DEFAULT` columns by always ensuring that 
new rows with NULL values get explicit NULLs written to the JSON storage, so 
that subsequent scans can tell the difference between those NULLs and any 
written `DEFAULT` values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to