MaxGekk commented on a change in pull request #25998: [SPARK-29328][SQL] Fix 
calculation of mean seconds per month
URL: https://github.com/apache/spark/pull/25998#discussion_r331360446
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/EventTimeWatermark.scala
 ##########
 @@ -28,9 +27,7 @@ object EventTimeWatermark {
   val delayKey = "spark.watermarkDelayMs"
 
   def getDelayMs(delay: CalendarInterval): Long = {
-    // We define month as `31 days` to simplify calculation.
-    val millisPerMonth = 
TimeUnit.MICROSECONDS.toMillis(CalendarInterval.MICROS_PER_DAY) * 31
-    delay.milliseconds + delay.months * millisPerMonth
+    delay.milliseconds + delay.months * MILLIS_PER_MONTH
 
 Review comment:
   I believe any place including this one when we need a duration (in seconds 
or its fractions). The difference between `months_between()` and this place is 
`months_between` uses month length to calculate fraction of month, and 28 or 31 
days per months don't really matter because it impacts on 2nd or 3rd digit in 
fractions but here we operate on bigger numbers when months form years. And it 
become matter how much days we use per year. Let's say we calculate duration of 
10 years which 120 months. If we use 31 days per months, this duration is 31 * 
120 = 10 * 372 = 3720 days but if one year is 365.2425, than 1 year = 3652. The 
difference is 3720 - 3652 = 68 days or the calculation error is more than 2 
months. That's matter I believe. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to