viirya opened a new pull request, #56081:
URL: https://github.com/apache/spark/pull/56081

   ### What changes were proposed in this pull request?
   
   Follow-up to #56072 (SPARK-57024). That PR fixed the degenerate
   per-element loops in three bulk-fill methods (`OnHeap.putNulls`,
   `OnHeap.putInts(rowId, count, value)`, `OffHeap.putNulls`). The same
   pattern exists in six sibling methods; this PR applies the same
   intrinsic substitutions:
   
   | Method | Substitution |
   | --- | --- |
   | `OnHeapColumnVector.putBooleans(rowId, count, value)` | 
`Arrays.fill(byte[], ..., (byte) v)` |
   | `OnHeapColumnVector.putBytes(rowId, count, value)` | `Arrays.fill(byte[], 
...)` |
   | `OnHeapColumnVector.putShorts(rowId, count, value)` | 
`Arrays.fill(short[], ...)` |
   | `OnHeapColumnVector.putLongs(rowId, count, value)` | `Arrays.fill(long[], 
...)` |
   | `OffHeapColumnVector.putBooleans(rowId, count, value)` | 
`Platform.setMemory` with `SET_MEMORY_THRESHOLD` fallback |
   | `OffHeapColumnVector.putBytes(rowId, count, value)` | `Platform.setMemory` 
with `SET_MEMORY_THRESHOLD` fallback |
   
   The two OffHeap methods reuse the `SET_MEMORY_THRESHOLD = 128` constant
   introduced in #56072 for `OffHeap.putNulls`. Below the threshold, an
   inline byte loop avoids the JNI fixed cost of `Unsafe.setMemory`; at or
   above, `setMemory` dominates and the gain accelerates up to ~10x at
   `count >= 4096`.
   
   This PR is based on top of #56072 since the threshold constant is
   defined there. If #56072 lands first, this PR rebases cleanly onto
   master.
   
   ### Why are the changes needed?
   
   The bulk-fill APIs on `WritableColumnVector` are the natural call to
   make from any column writer, but their implementations were per-element
   loops. Switching to intrinsics:
   
   - `Arrays.fill` is backed by HotSpot's `_jbyte_fill` / `_jshort_fill` /
     `_jlong_fill` intrinsic stubs; on byte/short arrays C2 can usually
     auto-vectorize the original loop and gains are modest, but for
     `long[]` and at small counts the intrinsic is meaningfully faster.
   - `Unsafe.setMemory` lowers to a native memset. For OffHeap byte fills
     this is dramatic at large counts because the original per-byte
     `Platform.putByte` loop cannot be vectorized through the JNI call.
   
   Measured on Apple M4 Max + OpenJDK 21.0.8, using a new
   `WritableColumnVectorBulkFillBenchmark` (added in a separate change,
   not part of this PR), Rate (M elements/s):
   
   **OffHeap byte fills (putBytes / putBooleans)**, threshold path:
   
   | count   | baseline | patched | delta |
   | ------: | -------: | ------: | ----- |
   | 8       | ~1,900   | ~1,840  | parity (small-count fallback) |
   | 64      | ~3,800   | ~3,760  | parity |
   | 512     | ~4,150   | ~13,100 | +3.2x |
   | 4,096   | ~4,340   | ~31,900 | +7.4x |
   | 65,536  | ~4,275   | ~43,700 | +10.2x |
   
   **OnHeap byte fills**:
   
   | count   | baseline | patched | delta |
   | ------: | -------: | ------: | ----- |
   | 8       | ~2,620   | ~3,230  | +23%  |
   | 64      | ~19,000  | ~25,400 | +33%  |
   | 512     | ~68,800  | ~86,200 | +25%  |
   | 4,096   | ~128,400 | ~133,300| +4%   |
   | 65,536  | ~143,200 | ~143,600| saturated (byte memory bandwidth) |
   
   **OnHeap longs**: +1-14% in the small/medium range, saturated by
   memory bandwidth at large counts. Included for consistency with the
   byte methods.
   
   OffHeap multi-byte fills (putShorts / putInts / putLongs / putFloats /
   putDoubles) are out of scope: `Platform.setMemory` is byte-only and the
   value=0 short-circuit alternative was prototyped under SPARK-57024 and
   showed no measurable gain.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing tests; no behavior change. Ran locally on top of #56072:
   
   - `VectorizedRleValuesReaderSuite`
   - `ColumnVectorSuite`
   - `ColumnarBatchSuite`
   - `ParquetIOSuite`
   
   237 tests, all pass.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Claude Opus 4.7)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to