ZiyaZa commented on code in PR #55967:
URL: https://github.com/apache/spark/pull/55967#discussion_r3265540191


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/MergeRowsExec.scala:
##########
@@ -518,6 +518,21 @@ case class MergeRowsExec(
       private val notMatchedBySourceInstructions: Seq[InstructionExec])
     extends Iterator[InternalRow] {
 
+    // Resolve metrics once per partition; longMetric(name) does a map lookup 
on each call.
+    // See SPARK-56933.
+    private val numTargetRowsCopied = 
MergeRowsExec.this.longMetric("numTargetRowsCopied")

Review Comment:
   nit: I believe `MergeRowsExec.this.longMetric()` can be replaced with just 
`longMetric()`



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/MergeRowsExec.scala:
##########
@@ -518,6 +518,21 @@ case class MergeRowsExec(
       private val notMatchedBySourceInstructions: Seq[InstructionExec])
     extends Iterator[InternalRow] {
 
+    // Resolve metrics once per partition; longMetric(name) does a map lookup 
on each call.
+    // See SPARK-56933.
+    private val numTargetRowsCopied = 
MergeRowsExec.this.longMetric("numTargetRowsCopied")
+    private val numTargetRowsInserted = 
MergeRowsExec.this.longMetric("numTargetRowsInserted")
+    private val numTargetRowsDeleted = 
MergeRowsExec.this.longMetric("numTargetRowsDeleted")
+    private val numTargetRowsUpdated = 
MergeRowsExec.this.longMetric("numTargetRowsUpdated")
+    private val numTargetRowsMatchedUpdated =
+      MergeRowsExec.this.longMetric("numTargetRowsMatchedUpdated")
+    private val numTargetRowsMatchedDeleted =
+      MergeRowsExec.this.longMetric("numTargetRowsMatchedDeleted")
+    private val numTargetRowsNotMatchedBySourceUpdated =
+      MergeRowsExec.this.longMetric("numTargetRowsNotMatchedBySourceUpdated")
+    private val numTargetRowsNotMatchedBySourceDeleted =
+      MergeRowsExec.this.longMetric("numTargetRowsNotMatchedBySourceDeleted")

Review Comment:
   Should we make all these lazy to not look them up when not needed?



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/MergeRowsExecBenchmark.scala:
##########
@@ -101,6 +105,33 @@ object MergeRowsExecBenchmark extends SqlBasedBenchmark 
with ClassicConversions
     Dataset.ofRows(spark, mergeRows)
   }
 
+  /**
+   * Like [[codegenBenchmark]], but with JIT warm-up and a longer timed window 
so interpreted
+   * (whole-stage off) results are more stable when comparing metric caching 
changes.
+   */
+  private def mergeRowsCodegenBenchmark(name: String, cardinality: Long)(f: => 
Unit): Unit = {

Review Comment:
   I am unsure about the benchmark numbers in the PR description. It says it's 
comparing `origin/master` to this PR, but here we are making changes to how 
these benchmarks are run. Are we doing apples-to-apples comparison here? Maybe 
it would be better to compare the following two:
   - This PR
   - This PR without the changes in MergeRowsExec



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/MergeRowsExecBenchmark.scala:
##########
@@ -101,6 +105,33 @@ object MergeRowsExecBenchmark extends SqlBasedBenchmark 
with ClassicConversions
     Dataset.ofRows(spark, mergeRows)
   }
 
+  /**
+   * Like [[codegenBenchmark]], but with JIT warm-up and a longer timed window 
so interpreted
+   * (whole-stage off) results are more stable when comparing metric caching 
changes.
+   */
+  private def mergeRowsCodegenBenchmark(name: String, cardinality: Long)(f: => 
Unit): Unit = {
+    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") { f }
+    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true") { f }

Review Comment:
   I'm not familiar with the benchmarking infra, but it seems `warmupTime` 
parameter of the Benchmark is controlling this already, making these redundant?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to