dtenedor commented on code in PR #37280:
URL: https://github.com/apache/spark/pull/37280#discussion_r930566861
##########
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala:
##########
@@ -1800,7 +1800,7 @@ class InsertSuite extends DataSourceTest with
SharedSparkSession {
if (testCase.insertNullsToStorage) {
null
} else {
- Row(Seq(Row(3, 4)), Seq(Map(false -> "mno", true -> "pqr")))
+ Row(Seq(Row(1, 2)), Seq(Map(false -> "def", true -> "jkl")))
Review Comment:
Hi Jiaan, good question:
* It seems actually the correct result for this row should be `2, NULL` :)
Because we run `alter table t alter column s drop default` and then `insert
into t select 2, default`, the default value should be NULL here.
* This is likely happening because the INSERT operations to the Orc table
are all writing to the same column in the same file. The implementation of
column default scanning for the Orc data source substitutes the existence
default value (assigned when the column was created) when the *entire column*
is missing.
* This is a known existing bug that @gengliangwang and I recently
discovered. The solution to fix the bug is to switch to a new file after
changing the default value in this way. I will prepare a PR to fix this bug
separately.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]