iemejia opened a new pull request, #56543: URL: https://github.com/apache/spark/pull/56543
### What changes were proposed in this pull request? Re-apply the bulk read optimization for `VectorizedDeltaBinaryPackedReader` (reverted in c13302acc2a) with a fix for the INT32 widening bug that caused the CI failure. **Commit 1** — Reapply the original optimization (revert of the revert): - Bulk `readIntegers`/`readLongs` via prefix-sum + `putInts`/`putLongs` - Zero-allocation unsigned long encoding (`encodeUnsignedLongBigEndian`) - `readIntegersAsLongs` and `readIntegersAsDoubles` overrides **Commit 2** — Fix the INT32 widening bug: - The Parquet INT32 delta encoder (`DeltaBinaryPackingValuesWriterForInteger`) computes deltas using Java int arithmetic with modular overflow. The bulk widened readers (`readIntegersAsLongs`, `readIntegersAsDoubles`) were performing the prefix sum in long space and writing raw long results without truncating back to int. When delta overflow occurs (e.g. a sequence containing `Int.MinValue`), the reconstructed long has the wrong sign. - Fix: truncate each prefix-sum result to int before widening to long/double - Add focused low-level tests for the overflow case (single-batch and split reads) - Add benchmark cases for the overflow pattern This is the same content as #55919, which was merged and reverted due to this bug. ### Why are the changes needed? The bulk read path eliminates per-value lambda dispatch overhead and enables the JIT to better vectorize the inner unpacking loop. See #55919 for full benchmark results. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - `ParquetTypeWideningSuite`: IntegerType -> LongType, IntegerType -> DoubleType - `ParquetDeltaEncodingInteger`: new focused tests for modular delta overflow - `ParquetDeltaEncodingInteger`/`Long`: full suites (30 tests) - `ParquetIOSuite`: UINT_64 tests - `VectorizedDeltaReaderBenchmark`: full suite including new overflow cases ### Was this patch authored or co-authored using generative AI tooling? Yes. Assisted-by: GitHub Copilot:claude-opus-4.6 cc @LuciferYang @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
