[jira] [Commented] (SPARK-31297) Speed-up date-time rebasing
[ https://issues.apache.org/jira/browse/SPARK-31297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070457#comment-17070457 ] Maxim Gekk commented on SPARK-31297: The rebasing of days doesn't depend on time zone, and has just 14 special dates: {code:scala} test("optimize rebasing") { val start = localDateToDays(LocalDate.of(1, 1, 1)) val end = localDateToDays(LocalDate.of(2030, 1, 1)) var days = start var diff = Long.MaxValue var counter = 0 while (days < end) { val rebased = rebaseGregorianToJulianDays(days) val curDiff = rebased - days if (curDiff != diff) { counter += 1 diff = curDiff val ld = daysToLocalDate(days) println(s"local date = $ld days = $days diff = ${diff} days") } days += 1 } println(s"counter = $counter") } {code} {code} local date = 0001-01-01 days = -719162 diff = -2 days local date = 0100-03-01 days = -682944 diff = -1 days local date = 0200-03-01 days = -646420 diff = 0 days local date = 0300-03-01 days = -609896 diff = 1 days local date = 0500-03-01 days = -536847 diff = 2 days local date = 0600-03-01 days = -500323 diff = 3 days local date = 0700-03-01 days = -463799 diff = 4 days local date = 0900-03-01 days = -390750 diff = 5 days local date = 1000-03-01 days = -354226 diff = 6 days local date = 1100-03-01 days = -317702 diff = 7 days local date = 1300-03-01 days = -244653 diff = 8 days local date = 1400-03-01 days = -208129 diff = 9 days local date = 1500-03-01 days = -171605 diff = 10 days local date = 1582-10-15 days = -141427 diff = 0 days counter = 14 {code} > Speed-up date-time rebasing > --- > > Key: SPARK-31297 > URL: https://issues.apache.org/jira/browse/SPARK-31297 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > I do believe it is possible to speed up date-time rebasing by building a map > of micros to diffs between original and rebased micros. And look up at the > map via binary search. > For example, the *America/Los_Angeles* time zone has less than 100 points > when diff changes: > {code:scala} > test("optimize rebasing") { > val start = instantToMicros(LocalDateTime.of(1, 1, 1, 0, 0, 0) > .atZone(getZoneId("America/Los_Angeles")) > .toInstant) > val end = instantToMicros(LocalDateTime.of(2030, 1, 1, 0, 0, 0) > .atZone(getZoneId("America/Los_Angeles")) > .toInstant) > var micros = start > var diff = Long.MaxValue > var counter = 0 > while (micros < end) { > val rebased = rebaseGregorianToJulianMicros(micros) > val curDiff = rebased - micros > if (curDiff != diff) { > counter += 1 > diff = curDiff > val ldt = > microsToInstant(micros).atZone(getZoneId("America/Los_Angeles")).toLocalDateTime > println(s"local date-time = $ldt diff = ${diff / MICROS_PER_MINUTE} > minutes") > } > micros += MICROS_PER_HOUR > } > println(s"counter = $counter") > } > {code} > {code:java} > local date-time = 0001-01-01T00:00 diff = -2909 minutes > local date-time = 0100-02-28T14:00 diff = -1469 minutes > local date-time = 0200-02-28T14:00 diff = -29 minutes > local date-time = 0300-02-28T14:00 diff = 1410 minutes > local date-time = 0500-02-28T14:00 diff = 2850 minutes > local date-time = 0600-02-28T14:00 diff = 4290 minutes > local date-time = 0700-02-28T14:00 diff = 5730 minutes > local date-time = 0900-02-28T14:00 diff = 7170 minutes > local date-time = 1000-02-28T14:00 diff = 8610 minutes > local date-time = 1100-02-28T14:00 diff = 10050 minutes > local date-time = 1300-02-28T14:00 diff = 11490 minutes > local date-time = 1400-02-28T14:00 diff = 12930 minutes > local date-time = 1500-02-28T14:00 diff = 14370 minutes > local date-time = 1582-10-14T14:00 diff = -29 minutes > local date-time = 1899-12-31T16:52:58 diff = 0 minutes > local date-time = 1917-12-27T11:52:58 diff = 60 minutes > local date-time = 1917-12-27T12:52:58 diff = 0 minutes > local date-time = 1918-09-15T12:52:58 diff = 60 minutes > local date-time = 1918-09-15T13:52:58 diff = 0 minutes > local date-time = 1919-06-30T16:52:58 diff = 31 minutes > local date-time = 1919-06-30T17:52:58 diff = 0 minutes > local date-time = 1919-08-15T12:52:58 diff = 60 minutes > local date-time = 1919-08-15T13:52:58 diff = 0 minutes > local date-time = 1921-08-31T10:52:58 diff = 60 minutes > local date-time = 1921-08-31T11:52:58 diff = 0 minutes > local date-time = 1921-09-30T11:52:58 diff = 60 minutes > local date-time = 1921-09-30T12:52:58 diff = 0 minutes > local date-time = 1922-09-30T12:52:58 diff = 60 minutes > local date-time = 1922-09-30T13:52:58 diff = 0 minutes > local date-time = 1981-09-30T12:52:58 diff = 60 minutes > local date-time =
[jira] [Commented] (SPARK-31297) Speed-up date-time rebasing
[ https://issues.apache.org/jira/browse/SPARK-31297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070286#comment-17070286 ] Maxim Gekk commented on SPARK-31297: [~cloud_fan] [~hyukjin.kwon] [~dongjoon] WDYT? > Speed-up date-time rebasing > --- > > Key: SPARK-31297 > URL: https://issues.apache.org/jira/browse/SPARK-31297 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > I do believe it is possible to speed up date-time rebasing by building a map > of micros to diffs between original and rebased micros. And look up at the > map via binary search. > For example, the *America/Los_Angeles* time zone has less than 100 points > when diff changes: > {code:scala} > test("optimize rebasing") { > val start = instantToMicros(LocalDateTime.of(1, 1, 1, 0, 0, 0) > .atZone(getZoneId("America/Los_Angeles")) > .toInstant) > val end = instantToMicros(LocalDateTime.of(2030, 1, 1, 0, 0, 0) > .atZone(getZoneId("America/Los_Angeles")) > .toInstant) > var micros = start > var diff = Long.MaxValue > var counter = 0 > while (micros < end) { > val rebased = rebaseGregorianToJulianMicros(micros) > val curDiff = rebased - micros > if (curDiff != diff) { > counter += 1 > diff = curDiff > val ldt = > microsToInstant(micros).atZone(getZoneId("America/Los_Angeles")).toLocalDateTime > println(s"local date-time = $ldt diff = ${diff / MICROS_PER_MINUTE} > minutes") > } > micros += MICROS_PER_HOUR > } > println(s"counter = $counter") > } > {code} > {code:java} > local date-time = 0001-01-01T00:00 diff = -2909 minutes > local date-time = 0100-02-28T14:00 diff = -1469 minutes > local date-time = 0200-02-28T14:00 diff = -29 minutes > local date-time = 0300-02-28T14:00 diff = 1410 minutes > local date-time = 0500-02-28T14:00 diff = 2850 minutes > local date-time = 0600-02-28T14:00 diff = 4290 minutes > local date-time = 0700-02-28T14:00 diff = 5730 minutes > local date-time = 0900-02-28T14:00 diff = 7170 minutes > local date-time = 1000-02-28T14:00 diff = 8610 minutes > local date-time = 1100-02-28T14:00 diff = 10050 minutes > local date-time = 1300-02-28T14:00 diff = 11490 minutes > local date-time = 1400-02-28T14:00 diff = 12930 minutes > local date-time = 1500-02-28T14:00 diff = 14370 minutes > local date-time = 1582-10-14T14:00 diff = -29 minutes > local date-time = 1899-12-31T16:52:58 diff = 0 minutes > local date-time = 1917-12-27T11:52:58 diff = 60 minutes > local date-time = 1917-12-27T12:52:58 diff = 0 minutes > local date-time = 1918-09-15T12:52:58 diff = 60 minutes > local date-time = 1918-09-15T13:52:58 diff = 0 minutes > local date-time = 1919-06-30T16:52:58 diff = 31 minutes > local date-time = 1919-06-30T17:52:58 diff = 0 minutes > local date-time = 1919-08-15T12:52:58 diff = 60 minutes > local date-time = 1919-08-15T13:52:58 diff = 0 minutes > local date-time = 1921-08-31T10:52:58 diff = 60 minutes > local date-time = 1921-08-31T11:52:58 diff = 0 minutes > local date-time = 1921-09-30T11:52:58 diff = 60 minutes > local date-time = 1921-09-30T12:52:58 diff = 0 minutes > local date-time = 1922-09-30T12:52:58 diff = 60 minutes > local date-time = 1922-09-30T13:52:58 diff = 0 minutes > local date-time = 1981-09-30T12:52:58 diff = 60 minutes > local date-time = 1981-09-30T13:52:58 diff = 0 minutes > local date-time = 1982-09-30T12:52:58 diff = 60 minutes > local date-time = 1982-09-30T13:52:58 diff = 0 minutes > local date-time = 1983-09-30T12:52:58 diff = 60 minutes > local date-time = 1983-09-30T13:52:58 diff = 0 minutes > local date-time = 1984-09-29T15:52:58 diff = 60 minutes > local date-time = 1984-09-29T16:52:58 diff = 0 minutes > local date-time = 1985-09-28T15:52:58 diff = 60 minutes > local date-time = 1985-09-28T16:52:58 diff = 0 minutes > local date-time = 1986-09-27T15:52:58 diff = 60 minutes > local date-time = 1986-09-27T16:52:58 diff = 0 minutes > local date-time = 1987-09-26T15:52:58 diff = 60 minutes > local date-time = 1987-09-26T16:52:58 diff = 0 minutes > local date-time = 1988-09-24T15:52:58 diff = 60 minutes > local date-time = 1988-09-24T16:52:58 diff = 0 minutes > local date-time = 1989-09-23T15:52:58 diff = 60 minutes > local date-time = 1989-09-23T16:52:58 diff = 0 minutes > local date-time = 1990-09-29T15:52:58 diff = 60 minutes > local date-time = 1990-09-29T16:52:58 diff = 0 minutes > local date-time = 1991-09-28T16:52:58 diff = 60 minutes > local date-time = 1991-09-28T17:52:58 diff = 0 minutes > local date-time = 1992-09-26T15:52:58 diff = 60 minutes > local date-time = 1992-09-26T16:52:58 diff = 0 minutes > local date-time = 1993-09-25T15:52:58 diff = 60 minutes > local date-time = 1993-09-25T16:52:58 diff = 0