dtenedor commented on code in PR #39362:
URL: https://github.com/apache/spark/pull/39362#discussion_r1060818425
##########
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala:
##########
@@ -1943,7 +1944,11 @@ class InsertSuite extends DataSourceTest with
SharedSparkSession {
Row(Seq(Row(1, 2)), Seq(Map(false -> "def", true -> "jkl"))),
Seq(Map(true -> "xyz"))),
Row(2,
- null,
+ if (config.dataSource != "orc") {
Review Comment:
> Thank you for review, @dtenedor .
>
> * Please see https://issues.apache.org/jira/browse/SPARK-41782 . We have a
benchmark to detect this kind of perf regression. You can run it locally in
your environment.
Thanks @dongjoon-hyun for the benchmark! The Jira simply comprises the title
`Regenerate benchmark results`. Is there some instructions for how to run the
benchmark?
> * This is a partial revert to the original code which is the existing
behavior before your PR like the previous Spark. As I mentioned in the PR
description, [SPARK-39862](https://issues.apache.org/jira/browse/SPARK-39862)
should propose a fix without perf regression.
>
> New feature is good as long as not breaking the old behavior.
Agree on this. However, that bug fix was merged into Spark 3.3 on Jul. 28,
2022. Is it possible that users could have built pipelines since then using the
new feature that would return incorrect results if we merged this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]