iemejia commented on PR #55919: URL: https://github.com/apache/spark/pull/55919#issuecomment-4692346315
@LuciferYang Sorry for the extra churn -- I added one more commit with `readIntegersAsLongs` and `readIntegersAsDoubles` overrides for the DELTA_BINARY_PACKED reader. It seemed worth including since the delta decoder already works on `long[]` internally, so these overrides skip the int narrowing step entirely and write longs/doubles directly from the prefix-sum buffer. Local benchmark shows **2.1x** for `readIntegersAsLongs` and **2.0x** for `readIntegersAsDoubles` vs the per-row default path. This benefits `DateToTimestampNTZUpdater`, `IntegerToLongUpdater`, and `IntegerToDoubleUpdater` when reading Parquet V2 DELTA_BINARY_PACKED encoded INT32 columns -- the improvement carries over automatically via the two-pass updater pattern in PR #55923. The review delta is small (25 lines of new code in the reader + 8 lines of benchmark cases) if you want to focus just on the new commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
