viirya commented on PR #56072: URL: https://github.com/apache/spark/pull/56072#issuecomment-4535881350
@LuciferYang Got an EPYC 7763 runner this time. The CPU-matched JDK 17 picture is cleaner but still shows a small regression on the Group C (def-level materialization) path: | Group | nullRatio, shape | master (7763) | this PR (7763) | delta | | --- | --- | ---: | ---: | ---: | | C — with def-levels | 0.5 random | 13.7 | 14.8 | -7% | | C — with def-levels | 0.5 clustered | 6.2 | 6.6 | -6% | | C — with def-levels | 0.9 clustered | 5.7 | 6.0 | -5% | | D — without def-levels | 0.5 random | 11.4 | 11.2 | +2% | | D — without def-levels | 0.5 clustered | 5.2 | 5.0 | +4% | | D — without def-levels | 0.9 clustered | 5.0 | 4.6 | +8% | The regression is concentrated on the def-level write path (Group C uses `defLevels.putInts(...)` which is the bulk fill we introduced), and disappears on Group D (which only touches `nulls.putNulls`). Group D shows the parity-to-mild-improvement we'd expect from the intrinsic. I think this is HotSpot 17 specific — `Arrays.fill(int[])` / `_jint_fill` on JDK 17 appears to be slower than the C2 auto-vectorized loop it replaces. Comparing the same CPU on JDK 21: | Group C, JDK 21 (same 7763) | master | this PR | delta | | --- | ---: | ---: | ---: | | 0.5 random | 13.1 | 11.4 | **+13%** | | 0.5 clustered | 5.8 | 4.7 | **+19%** | | 0.9 clustered | 5.5 | 4.3 | **+22%** | So the change is a clear win on JDK 21 / 25, parity on JDK 17 `putNulls`-only paths (Group D), and a ~4–9% regression on JDK 17 def-level materialization (Group C). Happy to drop the `OnHeap.putInts(rowId, count, value)` change if you'd prefer to avoid the JDK 17 regression entirely — that would keep the `putNulls` wins (Group D) for all JDKs while losing the JDK 21+ Group C wins. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
