szehon-ho opened a new pull request, #55967:
URL: https://github.com/apache/spark/pull/55967

   ### What changes were proposed in this pull request?
   
   Cache `SQLMetric` references once per partition in `MergeRowIterator` and 
update them directly in the hot loop. Previously, each row called 
`longMetric("…")`, which performs a `metrics(name)` map lookup on every 
increment (up to 2–3 lookups per delete/update row).
   
   This matches the pattern used elsewhere (e.g. `FilterEvaluatorFactory` 
passes a `SQLMetric` into the partition evaluator). The whole-stage codegen 
path is unchanged; it already resolves metrics once via `metricTerm`.
   
   ### Why are the changes needed?
   
   `MergeRowsExec` updates multiple MERGE metrics per output row on the 
interpreted path (`doExecute` / `MergeRowIterator`). For delete-heavy workloads 
with little projection work, repeated map lookups were a noticeable fraction of 
per-row cost. Production MERGE typically runs with whole-stage codegen enabled, 
but the interpreted path is still used when codegen is disabled or unsupported.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing `MergeRowsExec` / MERGE tests (CI).
   
   **Local benchmark** (`MergeRowsExecBenchmark`, 20M rows, Apple M4 Max, JDK 
21). Compared `origin/master` vs this branch using extended warm-up (15s JIT 
warm-up per case, ≥15s timed window, plus one untimed run per WSCG setting 
before measurement). Run with:
   
   ```bash
   SPARK_LOCAL_IP=127.0.0.1 JAVA_TOOL_OPTIONS="-Djava.net.preferIPv4Stack=true" 
\
     build/sbt -Dspark.driver.host=127.0.0.1 
-Dspark.driver.bindAddress=127.0.0.1 \
     "sql/Test/runMain 
org.apache.spark.sql.execution.benchmark.MergeRowsExecBenchmark"
   ```
   
   **Whole-stage off (interpreted path)** — best time (ms):
   
   | Case | Before | After | Change |
   |------|-------:|------:|--------|
   | matched update only | 3505 | 3505* | — |
   | not matched insert only | 3624 | 1249 | −66% |
   | matched update + not matched insert | 3536 | 1276 | −64% |
   | matched delete | 2659 | 555 | −79% |
   | conditional clauses | 3990 | 1269 | −68% |
   | matched + not matched + not matched by source | 3517 | 1119 | −68% |
   | split update (delete + insert) | 3926 | 1346 | −66% |
   
   \*One after-run outlier (5432 ms, only 3 timed iterations); other cases ran 
12–26 iterations with low stdev.
   
   **Whole-stage on (codegen)** — unchanged, e.g. matched delete best 13 ms 
before and after.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to