Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/21196#discussion_r185634058
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
---
@@ -888,14 +888,19 @@ object DateTimeUtils {
val months1 = year1 * 12 + monthInYear1
val months2 = year2 * 12 + monthInYear2
+ val monthDiff = (months1 - months2).toDouble
+
if (dayInMonth1 == dayInMonth2 || ((daysToMonthEnd1 == 0) &&
(daysToMonthEnd2 == 0))) {
- return (months1 - months2).toDouble
+ return monthDiff
}
- // milliseconds is enough for 8 digits precision on the right side
- val timeInDay1 = millis1 - daysToMillis(date1, timeZone)
- val timeInDay2 = millis2 - daysToMillis(date2, timeZone)
- val timesBetween = (timeInDay1 - timeInDay2).toDouble / MILLIS_PER_DAY
- val diff = (months1 - months2).toDouble + (dayInMonth1 - dayInMonth2 +
timesBetween) / 31.0
+ // using milliseconds can cause precision loss with more than 8 digits
+ // we follow Hive's implementation which uses seconds
--- End diff --
I checked how Hive works. It works as this comment says.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]