Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/23000#discussion_r234807971
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
---
@@ -410,6 +410,30 @@ class DateTimeUtilsSuite extends SparkFunSuite {
assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 78)
}
+ test("SPARK-26002: correct day of year calculations for Julian calendar
years") {
+ TimeZone.setDefault(TimeZoneUTC)
+ val c = Calendar.getInstance(TimeZoneUTC)
+ c.set(Calendar.MILLISECOND, 0)
+ (1000 to 1600 by 100).foreach { year =>
+ // January 1 is the 1st day of year.
+ c.set(year, 0, 1, 0, 0, 0)
+ assert(getYear(getInUTCDays(c.getTimeInMillis)) === year)
+ assert(getMonth(getInUTCDays(c.getTimeInMillis)) === 1)
+ assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 1)
+
+ // March 1 is the 61st day of the year as they are leap years. It is
true for
+ // even the multiples of 100 as before 1582-10-4 the Julian calendar
leap year calculation
+ // is used in which every multiples of 4 are leap years
+ c.set(year, 2, 1, 0, 0, 0)
+ assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 61)
+ assert(getMonth(getInUTCDays(c.getTimeInMillis)) === 3)
+
+ // For non-leap years:
+ c.set(year + 1, 2, 1, 0, 0, 0)
+ assert(getDayInYear(getInUTCDays(c.getTimeInMillis)) === 60)
+ }
--- End diff --
this is good, but I think its worth adding checks for a couple of special
cases:
* 1582-10-3
* 1582-10-14 (though I guess the meaning of "dayInYear" is not so clear in
this case)
* 1600-01-01
* 1600-03-01
I think they'll all be OK after your change, but good to have a check.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]