szehon-ho opened a new pull request, #55967:
URL: https://github.com/apache/spark/pull/55967
### What changes were proposed in this pull request?
Cache `SQLMetric` references once per partition in `MergeRowIterator` and
update them directly in the hot loop. Previously, each row called
`longMetric("…")`, which performs a `metrics(name)` map lookup on every
increment (up to 2–3 lookups per delete/update row).
This matches the pattern used elsewhere (e.g. `FilterEvaluatorFactory`
passes a `SQLMetric` into the partition evaluator). The whole-stage codegen
path is unchanged; it already resolves metrics once via `metricTerm`.
### Why are the changes needed?
`MergeRowsExec` updates multiple MERGE metrics per output row on the
interpreted path (`doExecute` / `MergeRowIterator`). For delete-heavy workloads
with little projection work, repeated map lookups were a noticeable fraction of
per-row cost. Production MERGE typically runs with whole-stage codegen enabled,
but the interpreted path is still used when codegen is disabled or unsupported.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing `MergeRowsExec` / MERGE tests (CI).
**Local benchmark** (`MergeRowsExecBenchmark`, 20M rows, Apple M4 Max, JDK
21). Compared `origin/master` vs this branch using extended warm-up (15s JIT
warm-up per case, ≥15s timed window, plus one untimed run per WSCG setting
before measurement). Run with:
```bash
SPARK_LOCAL_IP=127.0.0.1 JAVA_TOOL_OPTIONS="-Djava.net.preferIPv4Stack=true"
\
build/sbt -Dspark.driver.host=127.0.0.1
-Dspark.driver.bindAddress=127.0.0.1 \
"sql/Test/runMain
org.apache.spark.sql.execution.benchmark.MergeRowsExecBenchmark"
```
**Whole-stage off (interpreted path)** — best time (ms):
| Case | Before | After | Change |
|------|-------:|------:|--------|
| matched update only | 3505 | 3505* | — |
| not matched insert only | 3624 | 1249 | −66% |
| matched update + not matched insert | 3536 | 1276 | −64% |
| matched delete | 2659 | 555 | −79% |
| conditional clauses | 3990 | 1269 | −68% |
| matched + not matched + not matched by source | 3517 | 1119 | −68% |
| split update (delete + insert) | 3926 | 1346 | −66% |
\*One after-run outlier (5432 ms, only 3 timed iterations); other cases ran
12–26 iterations with low stdev.
**Whole-stage on (codegen)** — unchanged, e.g. matched delete best 13 ms
before and after.
### Was this patch authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]