On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > > On 2020-06-30 06:34, John Naylor wrote: > > In v9, I've simplified the patch somewhat to make it easier for future > > work to build on. > > > > - When truncating on month-or-greater intervals, require the origin to > > align on month. This removes the need to handle weird corner cases > > that have no straightforward behavior. > > - Remove hackish and possibly broken code to allow origin to be after > > the input timestamp. The default origin is Jan 1, 1 AD, so only AD > > dates will behave correctly by default. This is not enforced for now, > > since it may be desirable to find a way to get this to work in a nicer > > way. > > - Rebase docs over PG13 formatting changes. > > This looks pretty solid now. Are there any more corner cases and other > areas with unclear behavior that you are aware of?
Hi Peter, Thanks for taking a look! I believe there are no known corner cases aside from not throwing an error if origin > input, but I'll revisit that when we are more firm on what features we want support. > A couple of thoughts: > > - After reading the discussion a few times, I'm not so sure anymore > whether making this a cousin of date_trunc is the right way to go. As > you mentioned, there are some behaviors specific to date_trunc that > don't really make sense in date_trunc_interval, and maybe we'll have > more of those. As far as the behaviors, I'm not sure exactly what you what you were thinking of, but here are two issues off the top of my head: - If the new functions are considered variants of date_trunc(), there is the expectation that the options work the same way, in particular the timezone parameter. You asked specifically about that below, so I'll address that separately. - In the "week" case, the boundary position depends on the origin, since a week-long interval is just 7 days. > Also, date_trunc_interval isn't exactly a handy name. > Maybe something to think about. It's obviously fairly straightforward > to change it. Effectively, it puts timestamps into bins, so maybe date_bin() or something like that? > - There were various issues with the stride interval having months and > years. I'm not sure we even need that. It could be omitted unless you > are confident that your implementation is now sufficient. Months and years were a bit tricky, so I'd be happy to leave that out if there is not much demand for it. date_trunc() already has quarters, decades, centuries, and millenia. > - Also, negative intervals could be prohibited, but I suppose that > matters less. Good for the sake of completeness. I think they happen to work in v9 by accident, but it would be better not to expose that. > - I'm curious about the origin being set to 0001-01-01. This seems to > work correctly in that it sets the origin to a Monday, which is what we > wanted, but according to Google that day was a Saturday. Something to > do with Julian vs. Gregorian calendar? Right, working backwards from our calendar today, it's Monday, but at the time it would theoretically be Saturday, barring leap year miscalculations. > Maybe we should choose a date > that is a bit more recent and easier to reason with. 2001-01-01 would also be a Monday aligned with centuries and millenia, so that would be my next suggestion. If we don't care to match with date_trunc() on those larger units, we could also use 1900-01-01, so the vast majority of dates in databases are after the origin. > - Then again, I'm thinking that maybe we should make the origin > mandatory. Otherwise, the default answers when having strides larger > than a day are entirely arbitrary, e.g., > > => select date_trunc_interval('10 year', '0196-05-20 BC'::timestamp); > 0190-01-01 00:00:00 BC > > => select date_trunc_interval('10 year', '0196-05-20 AD'::timestamp); > 0191-01-01 00:00:00 Right. In the first case, the default origin is also after the input, and crosses the AD/BC boundary. Tricky to get right. > Perhaps the origin could be defaulted if the interval is less than a day > or something like that. If we didn't allow months and years to be units, it seems the default would always make sense? > - I'm wondering whether we need the date_trunc_interval(interval, > timestamptz, timezone) variant. Isn't that the same as > date_trunc_interval(foo AT ZONE xyz, value)? I based this on 600b04d6b5ef6 for date_trunc(), whose message states: date_trunc(field, timestamptz, zone_name) is the same as date_trunc(field, timestamptz at time zone zone_name) at time zone zone_name so without the shorthand, you need to specify the timezone twice, once for the calculation, and once for the output. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company