Hi,

some days ago I ran into a problem with the to_date() function. I originally described it on StackExchange:
https://dba.stackexchange.com/questions/250111/unexpected-behaviour-for-to-date-with-week-number-and-week-day

The problem:

If you want to parse a date string with year, week and day of week, you can do this using the ISO week pattern: 'IYYY-IW-ID'. This works as expected:

date string |  to_date()
------------+------------
'2019-1-1' | 2018-12-31 -> Monday of the first week of the year (defined as the week that includes the 4th of January)
'2019-1-2'  |  2019-01-01
'2019-1-3'  |  2019-01-02
'2019-1-4'  |  2019-01-03
'2019-1-5'  |  2019-01-04
'2019-1-6'  |  2019-01-05
'2019-1-7'  |  2019-01-06

'2019-2-1'  |  2019-01-07
'2019-2-2'  |  2019-01-08

But if you are trying this with the non-ISO pattern 'YYYY-WW-D', the result was not expected:

date string |  to_date()
-------------------------
'2019-1-1'  |  2019-01-01
'2019-1-2'  |  2019-01-01
'2019-1-3'  |  2019-01-01
'2019-1-4'  |  2019-01-01
'2019-1-5'  |  2019-01-01
'2019-1-6'  |  2019-01-01
'2019-1-7'  |  2019-01-01

'2019-2-1'  |  2019-01-08
'2019-2-2'  |  2019-01-08

As you can see, the 'D' part of the pattern doesn't influence the resulting date.

The answer of Laurenz Albe pointed to a part of the documentation, I missed so far:

"In to_timestamp and to_date, weekday names or numbers (DAY, D, and related field types) are accepted but are ignored for purposes of computing the result. The same is true for quarter (Q) fields." (https://www.postgresql.org/docs/12/functions-formatting.html)

So, I had a look at the relevant code part. I decided to try a patch by myself. Now it works as I would expect it:

date string |  to_date()
-------------------------
'2019-1-1' | 2018-12-30 -> Sunday (!) of the first week of the year (the first week is at the first day of year)
'2019-1-2'  |  2018-12-31
'2019-1-3'  |  2019-01-01
'2019-1-4'  |  2019-01-02
'2019-1-5'  |  2019-01-03
'2019-1-6'  |  2019-01-04
'2019-1-7'  |  2019-01-05

'2019-2-1'  |  2019-01-06
'2019-2-2'  |  2019-01-07

Furthermore, if you left the 'D' part, the date would be always set to the first day of the corresponding week (in that case it is Sunday, in contrast to the ISO week, which starts mondays).

To be consistent, I added similar code for the week of month pattern ('W'). So, using the pattern 'YYYY-MM-W-D' yields in:

date string   |  to_date()
---------------------------
'2018-12-5-1' |  2018-12-23
'2018-12-6-1' |  2018-12-30
'2019-1-1-1' | 2018-12-30 -> First day (Su) of the first week of the first month of the year '2019-2-2-1' | 2019-02-03 -> First day (Su) of the second week of February '2019-10-3-5' | 2019-10-17 -> Fifth day (Th) of the third week of October

If you left the 'D', it would be set to 1 as well.

The code can be seen here:
https://github.com/S-Man42/postgres/commit/534e6bd70e23864f385d60ecf187496c7f4387c9

I hope, keeping the code style of the surrounding code (especially the ISO code) is ok for you.

Now the questions:
1. Although the ignorance of the 'D' pattern is well documented, does the new behaviour might be interesting for you?
2. Does it work as you'd expect it?
3. Because this could be my very first contribution to the PostgreSQL code base, I really want you to be as critical as possible. I am not quite sure if I didn't miss something important. 4. Currently something like '2019-1-8' does not throw an exception but results in the same as '2019-2-1' (8th is the same as the 1st of the next week). On the other hand, currently, the ISO week conversion gives out the result of '2019-1-7' for every 'D' >= 7. I am not sure if this is better. I think a consistent exception handling should be discussed separately (date roll over vs. out of range exception vs. ISO week behaviour)

So far, I am very curious about your opinions!

Kind regards,
Mark/S-Man42


Reply via email to