MaxGekk opened a new pull request #25998: [SPARK-29328][SQL] Fix calculation of 
mean seconds per month
URL: https://github.com/apache/spark/pull/25998
 
 
   ### What changes were proposed in this pull request?
   I introduced new constants `SECONDS_PER_MONTH` and `MILLIS_PER_MONTH`, and 
reused it in calculations of seconds/milliseconds per month. 
`SECONDS_PER_MONTH` is 2629746 because the average year of the Gregorian 
calendar is 365.2425 days long or 60 * 60 * 24 * 365.2425 = 31556952.0 = 12 * 
2629746 seconds per year.
   
   ### Why are the changes needed?
   Spark uses the proleptic Gregorian calendar (see 
https://issues.apache.org/jira/browse/SPARK-26651) in which the average year is 
365.2425 days (see https://en.wikipedia.org/wiki/Gregorian_calendar) but 
existing implementation assumes 31 days per months or 12 * 31 = 372 days. 
That's far away from the the truth.
   
   ### Does this PR introduce any user-facing change?
    Yes, the changes affect at least 3 methods in `GroupStateImpl`, 
`EventTimeWatermark` and `MonthsBetween`. For example, the `month_between()` 
function will return different result in some cases.
   
   Before:
   ```sql
   spark-sql> select months_between('2019-09-15', '1970-01-01');
   596.4516129
   ```
   After:
   ```sql
   spark-sql> select months_between('2019-09-15', '1970-01-01');
   596.45996838
   ```
   
   ### How was this patch tested?
   By existing test suite `DateTimeUtilsSuite`, `DateFunctionsSuite` and 
`DateExpressionsSuite`.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to