[GitHub] [spark] dtenedor commented on a diff in pull request #39362: [SPARK-41858][SQL] Fix ORC reader perf regression due to DEFAULT value feature

GitBox Tue, 03 Jan 2023 09:49:47 -0800


dtenedor commented on code in PR #39362:
URL: https://github.com/apache/spark/pull/39362#discussion_r1060818425



##########
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala:
##########
@@ -1943,7 +1944,11 @@ class InsertSuite extends DataSourceTest with 
SharedSparkSession {
               Row(Seq(Row(1, 2)), Seq(Map(false -> "def", true -> "jkl"))),
               Seq(Map(true -> "xyz"))),
             Row(2,
-              null,
+              if (config.dataSource != "orc") {

Review Comment:
   > Thank you for review, @dtenedor .
   > 
   > * Please see https://issues.apache.org/jira/browse/SPARK-41782 . We have a 
benchmark to detect this kind of perf regression. You can run it locally in 
your environment.
   
   Thanks @dongjoon-hyun for the benchmark! The Jira simply comprises the title 
`Regenerate benchmark results`. Is there some instructions for how to run the 
benchmark?
   
   > * This is a partial revert to the original code which is the existing 
behavior before your PR like the previous Spark. As I mentioned in the PR 
description, [SPARK-39862](https://issues.apache.org/jira/browse/SPARK-39862) 
should propose a fix without perf regression.
   > 
   > New feature is good as long as not breaking the old behavior.
   
   Agree on this. However, that bug fix was merged into Spark 3.3 on Jul. 28, 
2022. Is it possible that users could have built pipelines since then using the 
new feature that would return incorrect results if we merged this PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dtenedor commented on a diff in pull request #39362: [SPARK-41858][SQL] Fix ORC reader perf regression due to DEFAULT value feature

Reply via email to