Re: [PR] [SPARK-56804][SQL] Add bulk read+convert path for DATE to TimestampNTZ Parquet vector updater [spark]

via GitHub Thu, 14 May 2026 23:11:04 -0700


LuciferYang commented on PR #55855:
URL: https://github.com/apache/spark/pull/55855#issuecomment-4457362978


   Closing this. The cross-JDK benchmark run shows no measurable speedup (JDK 
17 flat, JDK 21 ~6%, JDK 25 slightly worse, all within noise):
   
   | JDK | Baseline | After this PR | Delta |
   |---|---|---|---|
   | 17 | 33.9 ns/row | 33.9 ns/row | 0% |
   | 21 | 27.5 ns/row | 25.9 ns/row | ~+6% |
   | 25 | 24.7 ns/row | 27.3 ns/row | ~-11% |
   
   Root cause: the per-row bottleneck is `DateTimeUtils.daysToMicros(days, 
zoneId)` itself, which constructs a `LocalDate`, then `ZonedDateTime`, then 
`Instant` for every value — dominating the ~25-30 ns/row baseline cost. The 
bulk-read pattern that delivered 3-14× for the sibling PRs (SPARK-56791 / 
SPARK-56801 / SPARK-56802 / SPARK-56803) saves the per-row virtual dispatch on 
`readInteger()`, but that's only a few ns and disappears into the conversion 
overhead here.
   
   Will follow up with a focused PR that fast-paths `daysToMicros` when the 
zone is `ZoneOffset.UTC` (mathematically `days * MICROS_PER_DAY`, no allocation 
needed) — that's where the real win lives.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56804][SQL] Add bulk read+convert path for DATE to TimestampNTZ Parquet vector updater [spark]

Reply via email to