Licht-T opened a new pull request, #56848:
URL: https://github.com/apache/spark/pull/56848

   ### What changes were proposed in this pull request?
   
   `date_trunc` (`TruncTimestamp`) resolves the session zone offset for each 
row via `ZoneRules.getOffset(Instant)` -- a binary search over the zone's 
transition array -- and for non-fixed-offset zones it does so twice per row 
(the input instant and the candidate truncated instant used by the DST-equality 
guard from SPARK-56663 / SPARK-56769).
   
   This PR adds a per-task `ZoneOffsetCache` that memoizes the resolved offset 
over the half-open epoch-second interval `[lo, hi)` on which it is provably 
constant, derived from the surrounding zone transitions (`nextTransition` / 
`previousTransition`, anchored on an interior point to avoid an off-by-one when 
an instant sits exactly on a transition). A lookup inside the cached interval 
reduces to two comparisons instead of a binary search.
   
   ### Why are the changes needed?
   
   The session time zone is constant for a query and a zone's offset is 
piecewise-constant between DST/historical transitions, so consecutive rows 
almost always fall in the same constant-offset window (analytic data is 
typically temporally clustered -- time series, date-partitioned tables, 
post-sort). Repeating the transition-array binary search on every row is 
redundant work on the hot path.
   
   `DateTimeBenchmark` Truncation, whole-stage codegen on, session zone 
`America/Los_Angeles`, OpenJDK 17 on a 12th Gen Intel i7-1260P, ns/row (lower 
is better):
   
   | level | without cache | with cache | speedup |
   |-------|--------------:|-----------:|--------:|
   | date_trunc YEAR | 98.2 | 56.8 | 1.73x |
   | date_trunc QUARTER | 109.3 | 71.7 | 1.52x |
   | date_trunc MONTH | 90.8 | 53.7 | 1.69x |
   | date_trunc WEEK | 77.8 | 40.6 | 1.92x |
   | date_trunc DAY | 64.8 | 33.0 | 1.96x |
   | date_trunc SECOND (control) | 28.7 | 27.7 | ~1.0x |
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing `DateTimeUtilsSuite` and `DateExpressionsSuite` pass.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes, co-authored with Claude Code.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to