viirya commented on PR #56072:
URL: https://github.com/apache/spark/pull/56072#issuecomment-4535881350

   @LuciferYang Got an EPYC 7763 runner this time. The CPU-matched JDK 17 
picture is cleaner but still shows a small regression on the Group C (def-level 
materialization) path:
   
   | Group | nullRatio, shape | master (7763) | this PR (7763) | delta |
   | --- | --- | ---: | ---: | ---: |
   | C — with def-levels | 0.5 random | 13.7 | 14.8 | -7% |
   | C — with def-levels | 0.5 clustered | 6.2 | 6.6 | -6% |
   | C — with def-levels | 0.9 clustered | 5.7 | 6.0 | -5% |
   | D — without def-levels | 0.5 random | 11.4 | 11.2 | +2% |
   | D — without def-levels | 0.5 clustered | 5.2 | 5.0 | +4% |
   | D — without def-levels | 0.9 clustered | 5.0 | 4.6 | +8% |
   
   The regression is concentrated on the def-level write path (Group C uses 
`defLevels.putInts(...)` which is the bulk fill we introduced), and disappears 
on Group D (which only touches `nulls.putNulls`). Group D shows the 
parity-to-mild-improvement we'd expect from the intrinsic.
   
   I think this is HotSpot 17 specific — `Arrays.fill(int[])` / `_jint_fill` on 
JDK 17 appears to be slower than the C2 auto-vectorized loop it replaces. 
Comparing the same CPU on JDK 21:
   
   | Group C, JDK 21 (same 7763) | master | this PR | delta |
   | --- | ---: | ---: | ---: |
   | 0.5 random | 13.1 | 11.4 | **+13%** |
   | 0.5 clustered | 5.8 | 4.7 | **+19%** |
   | 0.9 clustered | 5.5 | 4.3 | **+22%** |
   
   So the change is a clear win on JDK 21 / 25, parity on JDK 17 
`putNulls`-only paths (Group D), and a ~4–9% regression on JDK 17 def-level 
materialization (Group C). Happy to drop the `OnHeap.putInts(rowId, count, 
value)` change if you'd prefer to avoid the JDK 17 regression entirely — that 
would keep the `putNulls` wins (Group D) for all JDKs while losing the JDK 21+ 
Group C wins.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to