[GitHub] [spark] dtenedor commented on a diff in pull request #37280: [SPARK-39862][SQL] Fix bug in existence DEFAULT value lookups for V2 data sources

GitBox Tue, 26 Jul 2022 19:41:53 -0700


dtenedor commented on code in PR #37280:
URL: https://github.com/apache/spark/pull/37280#discussion_r930566861



##########
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala:
##########
@@ -1800,7 +1800,7 @@ class InsertSuite extends DataSourceTest with 
SharedSparkSession {
               if (testCase.insertNullsToStorage) {
                 null
               } else {
-                Row(Seq(Row(3, 4)), Seq(Map(false -> "mno", true -> "pqr")))
+                Row(Seq(Row(1, 2)), Seq(Map(false -> "def", true -> "jkl")))

Review Comment:
   Hi Jiaan, good question:
   
   * It seems actually the correct result for this row should be `2, NULL` :) 
Because we run `alter table t alter column s drop default` and then `insert 
into t select 2, default`, the default value should be NULL here.
   * This is likely happening because the INSERT operations to the Orc table 
are all writing to the same column in the same file. The implementation of 
column default scanning for the Orc data source substitutes the existence 
default value (assigned when the column was created) when the *entire column* 
is missing.
   * This is a known existing bug that @gengliangwang and I recently 
discovered. The solution to fix the bug is to switch to a new file after 
changing the default value in this way. I will prepare a PR to fix this bug 
separately.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dtenedor commented on a diff in pull request #37280: [SPARK-39862][SQL] Fix bug in existence DEFAULT value lookups for V2 data sources

Reply via email to