subject:"truncating timestamps on arbitrary intervals"

Re: truncating timestamps on arbitrary intervals

2021-07-28 Thread John Naylor

On Wed, Jul 28, 2021 at 12:15 AM Michael Paquier 
wrote:
>
> On Tue, Jul 27, 2021 at 12:05:51PM -0400, John Naylor wrote:
> > Concretely, I propose to push the attached on master and v14. Since
we're
> > in beta 2 and this thread might not get much attention, I've CC'd the
RMT.
>
> (It looks like gmail has messed up a bit the format of your last
> message.)

Hmm, it looks fine in the archives.

> Hmm.  The docs say also the following thing, but your patch does not
> reflect that anymore:
> "Negative intervals are allowed and are treated the same as positive
> intervals."

I'd forgotten that was documented based on incomplete information, thanks
for looking! Pushed with that fixed.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

2021-07-27 Thread Michael Paquier

On Tue, Jul 27, 2021 at 12:05:51PM -0400, John Naylor wrote:
> Concretely, I propose to push the attached on master and v14. Since we're
> in beta 2 and this thread might not get much attention, I've CC'd the RMT.

(It looks like gmail has messed up a bit the format of your last
message.)

Hmm.  The docs say also the following thing, but your patch does not
reflect that anymore:
"Negative intervals are allowed and are treated the same as positive
intervals."
So you may want to update that, at least.
--
Michael


signature.asc
Description: PGP signature

Re: truncating timestamps on arbitrary intervals

2021-07-27 Thread Tom Lane

John Naylor  writes:
> Concretely, I propose to push the attached on master and v14. Since we're
> in beta 2 and this thread might not get much attention, I've CC'd the RMT.

+1, we can figure out whether that has a use some other time.

regards, tom lane

Re: truncating timestamps on arbitrary intervals

2021-07-27 Thread John Naylor

I wrote:

> On Thu, Jul 22, 2021 at 4:49 PM Bauyrzhan Sakhariyev <
baurzhansahar...@gmail.com> wrote:
> >
> > > No, the boundary is intentionally the earlier one:
> >
> > I found that commit in GitHub, thanks for pointing it out.
> > When I test locally origin_in_the_future case I get different results
for positive and negative intervals (see queries #1 and #2 from above, they
have same timestamp, origin and interval magnitude, difference is only in
interval sign) - can it be that the version I downloaded from
https://www.enterprisedb.com/postgresql-early-experience doesn't include
commit with that improvement?
>
> Sorry, I wasn't clear. The intention is that the boundary is on the lower
side, but query #1 doesn't follow that, so that's a bug in my view. I found
while developing the feature that the sign of the stride didn't seem to
matter, but evidently I didn't try with the origin in the future.
>
> > >  I wonder if we should just disallow negative intervals here.
> >
> > I cannot imagine somebody using negative as a constant argument but
users can pass another column as a first argument date or some function(ts)
- not likely but possible. A line in docs about the leftmost point of
interval as start of the bin could be helpful.
>
> In recent years there have been at least two attempts to add an absolute
value function for intervals, and both stalled over semantics, so I'd
rather just side-step the issue, especially as we're in beta.

Concretely, I propose to push the attached on master and v14. Since we're
in beta 2 and this thread might not get much attention, I've CC'd the RMT.

--
John Naylor
EDB: http://www.enterprisedb.com


0001-Disallow-negative-strides-in-date_bin.patch
Description: Binary data

Re: truncating timestamps on arbitrary intervals

2021-07-23 Thread John Naylor

On Thu, Jul 22, 2021 at 4:49 PM Bauyrzhan Sakhariyev <
baurzhansahar...@gmail.com> wrote:
>
> > No, the boundary is intentionally the earlier one:
>
> I found that commit in GitHub, thanks for pointing it out.
> When I test locally origin_in_the_future case I get different results for
positive and negative intervals (see queries #1 and #2 from above, they
have same timestamp, origin and interval magnitude, difference is only in
interval sign) - can it be that the version I downloaded from
https://www.enterprisedb.com/postgresql-early-experience doesn't include
commit with that improvement?

Sorry, I wasn't clear. The intention is that the boundary is on the lower
side, but query #1 doesn't follow that, so that's a bug in my view. I found
while developing the feature that the sign of the stride didn't seem to
matter, but evidently I didn't try with the origin in the future.

> >  I wonder if we should just disallow negative intervals here.
>
> I cannot imagine somebody using negative as a constant argument but users
can pass another column as a first argument date or some function(ts) - not
likely but possible. A line in docs about the leftmost point of interval as
start of the bin could be helpful.

In recent years there have been at least two attempts to add an absolute
value function for intervals, and both stalled over semantics, so I'd
rather just side-step the issue, especially as we're in beta.

> >In the case of full units (1 minute, 1 hour, etc.), it gives the same
result as the analogous date_trunc call,
> Was not obvious to me that we need to supply Monday origin to make
date_bin(1 week, ts) produce same result with date_trunc

The docs for date_trunc() don't mention this explicitly, but it might be
worth mentioning ISO weeks. There is a nearby mention for EXTRACT():

https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

"The number of the ISO 8601 week-numbering week of the year. By definition,
ISO weeks start on Mondays and the first week of a year contains January 4
of that year. In other words, the first Thursday of a year is in week 1 of
that year."

> Sorry for the verbose report and thanks for the nice function -  I know
it's not yet released, was just playing around with beta as I want to align
CrateDB date_bin with Postgresql

Thanks again for testing! This is good feedback.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

2021-07-22 Thread John Naylor

On Thu, Jul 22, 2021 at 4:49 PM Bauyrzhan Sakhariyev <
baurzhansahar...@gmail.com> wrote:
> Not related to negative interval - I created a PR for adding zero check
for stride https://github.com/postgres/postgres/pull/67 and after getting
it closed I stopped right there - 1 line check doesn't worth going through
the patching process I'm not familiar with.

Thanks for the pull request! I added tests and reworded the error message
slightly to match current style, and pushed.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

2021-07-22 Thread Bauyrzhan Sakhariyev

> No, the boundary is intentionally the earlier one:

I found that commit in GitHub, thanks for pointing it out.
When I test locally *origin_in_the_future *case I get different results for
positive and negative intervals (see queries #1 and #2 from above, they
have same timestamp, origin and interval magnitude, difference is only in
interval sign) - can it be that the version I downloaded from
https://www.enterprisedb.com/postgresql-early-experience doesn't include
commit with that improvement?

>  I wonder if we should just disallow negative intervals here.

I cannot imagine somebody using negative as a constant argument but users
can pass another column as a first argument date or some function(ts) - not
likely but possible. A line in docs about the leftmost point of interval as
start of the bin could be helpful.

Not related to negative interval - I created a PR for adding zero check for
stride https://github.com/postgres/postgres/pull/67 and after getting it
closed I stopped right there - 1 line check doesn't worth going through the
patching process I'm not familiar with.

>In the case of full units (1 minute, 1 hour, etc.), it gives the same
result as the analogous date_trunc call,
Was not obvious to me that we need to supply Monday origin to make
date_bin(1 week, ts) produce same result with date_trunc

Sorry for the verbose report and thanks for the nice function -  I know
it's not yet released, was just playing around with beta as I want to
align CrateDB
date_bin  with Postgresql

On Thu, Jul 22, 2021 at 7:28 PM John Naylor 
wrote:

>
> On Thu, Jul 22, 2021 at 12:24 PM Bauyrzhan Sakhariyev <
> baurzhansahar...@gmail.com> wrote:
> >
> > Is date_bin supposed to return the beginning of the bin?
>
> Thanks for testing! And yes.
>
> > And does the sign of an interval define the "direction" of the bin?
>
> No, the boundary is intentionally the earlier one:
>
> /*
>  * Make sure the returned timestamp is at the start of the bin, even if
>  * the origin is in the future.
>  */
> if (origin > timestamp && stride_usecs > 1)
> tm_delta -= stride_usecs;
>
> I wonder if we should just disallow negative intervals here.
>
> --
> John Naylor
> EDB: http://www.enterprisedb.com
>

Re: truncating timestamps on arbitrary intervals

2021-07-22 Thread John Naylor

On Thu, Jul 22, 2021 at 12:24 PM Bauyrzhan Sakhariyev <
baurzhansahar...@gmail.com> wrote:
>
> Is date_bin supposed to return the beginning of the bin?

Thanks for testing! And yes.

> And does the sign of an interval define the "direction" of the bin?

No, the boundary is intentionally the earlier one:

/*
 * Make sure the returned timestamp is at the start of the bin, even if
 * the origin is in the future.
 */
if (origin > timestamp && stride_usecs > 1)
tm_delta -= stride_usecs;

I wonder if we should just disallow negative intervals here.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

2021-07-22 Thread Bauyrzhan Sakhariyev

Is date_bin supposed to return the beginning of the bin? And does the sign
of an interval define the "direction" of the bin?
Judging by results of queries #1 and #2, sign of interval decides a
direction timestamp gets shifted to (in both cases ts < origin)
but when ts >origin (queries #3 and #4) interval sign doesn't matter,
specifically #4 doesn't return 6-th of January.

1. SELECT date_bin('-2 days'::interval, timestamp '2001-01-01
00:00:00', timestamp
'2001-01-04 00:00:00'); -- 2001-01-02 00:00:00
2. SELECT date_bin('2 days'::interval, timestamp '2001-01-01
00:00:00', timestamp
'2001-01-04 00:00:00'); -- 2000-12-31 00:00:00
3. SELECT date_bin('2 days'::interval, timestamp '2001-01-04
00:00:00', timestamp
'2001-01-01 00:00:00'); -- 2001-01-03 00:00:00
4. SELECT date_bin('-2 days'::interval, timestamp '2001-01-04
00:00:00', timestamp
'2001-01-01 00:00:00'); -- 2001-01-03 00:00:00

On Thu, Jul 22, 2021 at 6:21 PM John Naylor 
wrote:

> Hi,
>
> When analyzing time-series data, it's useful to be able to bin
> timestamps into equally spaced ranges. date_trunc() is only able to
> bin on a specified whole unit. In the attached patch for the March
> commitfest, I propose a new function date_trunc_interval(), which can
> truncate to arbitrary intervals, e.g.:
>
> select date_trunc_interval('15 minutes', timestamp '2020-02-16
> 20:48:40'); date_trunc_interval
> -
>  2020-02-16 20:45:00
> (1 row)
>
> With this addition, it might be possible to turn the existing
> date_trunc() functions into wrappers. I haven't done that here because
> it didn't seem practical at this point. For one, the existing
> functions have special treatment for weeks, centuries, and millennia.
>
> Note: I've only written the implementation for the type timestamp
> without timezone. Adding timezone support would be pretty simple, but
> I wanted to get feedback on the basic idea first before making it
> complete. I've also written tests and very basic documentation.
>
> --
> John Naylorhttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: truncating timestamps on arbitrary intervals

2021-04-23 Thread Peter Eisentraut


On 22.04.21 11:16, Justin Pryzby wrote:

It looks like we all missed that I misspelled "date_bin" as
"date_trunc"...sorry.  I will include this with my next round of doc review, in
case you don't want to make a separate commit for it.


fixed

Re: truncating timestamps on arbitrary intervals

2021-04-22 Thread Justin Pryzby

On Fri, Apr 09, 2021 at 10:02:47PM +0200, Peter Eisentraut wrote:
> On 30.03.21 18:50, John Naylor wrote:
> > On Sat, Mar 27, 2021 at 1:06 PM Justin Pryzby wrote:
> >  >
> >  > The current docs seem to be missing a "synopsis", like
> >  >
> >  > +
> >  > +date_trunc(stride, 
> > timestamp, origin)
> >  > +
> > 
> > The attached
> > - adds a synopsis
> > - adds a bit more description to the parameters similar to those in
> > date_trunc
> > - documents that negative intervals are treated the same as positive ones
> > 
> > Note on the last point: This just falls out of the math, so was not
> > deliberate, but it seems fine to me. We could ban negative intervals,
> > but that would possibly just inconvenience some people unnecessarily. We
> > could also treat negative strides differently somehow, but I don't
> > immediately see a useful and/or intuitive change in behavior to come of
> > that.
> 
> committed

It looks like we all missed that I misspelled "date_bin" as
"date_trunc"...sorry.  I will include this with my next round of doc review, in
case you don't want to make a separate commit for it.

https://www.postgresql.org/docs/devel/functions-datetime.html#FUNCTIONS-DATETIME-BIN

>From f4eab5c0f908d868540ab33aa12b82fd05f19f52 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 22 Apr 2021 03:37:18 -0500
Subject: [PATCH] date_bin: fixup for added documentation in 49fb4e

---
 doc/src/sgml/func.sgml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 53f4c09c81..cc4e1b0a36 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9946,13 +9946,13 @@ date_trunc(stride, 
timestamp
 
-date_trunc(stride, 
source, origin)
+date_bin(stride, source, 
origin)
 
 source is a value expression of type
 timestamp or timestamp with time zone.  (Values
 of type date are cast automatically to
 timestamp.)  stride is a value
-expression of type  interval.  The return value is likewise
+expression of type interval.  The return value is likewise
 of type timestamp or timestamp with time zone,
 and it marks the beginning of the bin into which the
 source is placed.
-- 
2.17.0

Re: truncating timestamps on arbitrary intervals

2021-04-10 Thread Peter Eisentraut


On 10.04.21 14:53, John Naylor wrote:


On Sat, Apr 10, 2021 at 7:43 AM Peter Eisentraut 
> wrote:

 >
 > On 30.03.21 18:06, John Naylor wrote:
 > > Currently, when the origin is after the input, the result is the
 > > timestamp at the end of the bin, rather than the beginning as expected.
 > > The attached puts the result consistently at the beginning of the bin.
 >
 > In the patch
 >
 > +   if (origin > timestamp && stride_usecs > 1)
 > +       tm_delta -= stride_usecs;
 >
 > is the condition stride_usecs > 1 really necessary?  My assessment is
 > that it's not, in which case it would be better to omit it.

Without the condition, the case of 1 microsecond will fail to be a 
no-op. This case has no practical use, but it still must work correctly, 
just as date_trunc('microsecond', input) does.


Ah yes, the tests cover that.  Committed.

Re: truncating timestamps on arbitrary intervals

2021-04-10 Thread John Naylor

On Sat, Apr 10, 2021 at 7:43 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:
>
> On 30.03.21 18:06, John Naylor wrote:
> > Currently, when the origin is after the input, the result is the
> > timestamp at the end of the bin, rather than the beginning as expected.
> > The attached puts the result consistently at the beginning of the bin.
>
> In the patch
>
> +   if (origin > timestamp && stride_usecs > 1)
> +   tm_delta -= stride_usecs;
>
> is the condition stride_usecs > 1 really necessary?  My assessment is
> that it's not, in which case it would be better to omit it.

Without the condition, the case of 1 microsecond will fail to be a no-op.
This case has no practical use, but it still must work correctly, just as
date_trunc('microsecond', input) does.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

2021-04-10 Thread Peter Eisentraut


On 30.03.21 18:06, John Naylor wrote:
Currently, when the origin is after the input, the result is the 
timestamp at the end of the bin, rather than the beginning as expected. 
The attached puts the result consistently at the beginning of the bin.


In the patch

+   if (origin > timestamp && stride_usecs > 1)
+   tm_delta -= stride_usecs;

is the condition stride_usecs > 1 really necessary?  My assessment is 
that it's not, in which case it would be better to omit it.

Re: truncating timestamps on arbitrary intervals

2021-04-09 Thread Peter Eisentraut

On 30.03.21 18:50, John Naylor wrote:
On Sat, Mar 27, 2021 at 1:06 PM Justin Pryzby > wrote:

 >
 > The current docs seem to be missing a "synopsis", like
 >
 > +
 > +date_trunc(stride, 
timestamp, origin)

 > +

The attached
- adds a synopsis
- adds a bit more description to the parameters similar to those in 
date_trunc

- documents that negative intervals are treated the same as positive ones

Note on the last point: This just falls out of the math, so was not 
deliberate, but it seems fine to me. We could ban negative intervals, 
but that would possibly just inconvenience some people unnecessarily. We 
could also treat negative strides differently somehow, but I don't 
immediately see a useful and/or intuitive change in behavior to come of 
that.

committed

Re: truncating timestamps on arbitrary intervals

2021-04-01 Thread John Naylor

On Thu, Apr 1, 2021 at 9:11 AM Salek Talangi 
wrote:
>
> Hi all,
>
> it might be a bit late now, but do you know that TimescaleDB already has
a similar feature, named time_bucket?
> https://docs.timescale.com/latest/api#time_bucket
> Perhaps that can help with some design decisions.

Yes, thanks I'm aware of it. It's a bit more feature-rich, and I wanted to
have something basic that users can have available without installing an
extension.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

2021-04-01 Thread Salek Talangi

Hi all,

it might be a bit late now, but do you know that TimescaleDB already has a
similar feature, named time_bucket?
https://docs.timescale.com/latest/api#time_bucket
Perhaps that can help with some design decisions.
I saw your feature on Depesz' "Waiting for PostgreSQL 14" and remembered
reading about it just two days ago.

Best regards
Salek Talangi

Am Do., 1. Apr. 2021 um 13:31 Uhr schrieb John Naylor <
john.nay...@enterprisedb.com>:

> On Sat, Mar 27, 2021 at 1:06 PM Justin Pryzby 
> wrote:
> >
> > The current docs seem to be missing a "synopsis", like
> >
> > +
> > +date_trunc(stride,
> timestamp, origin)
> > +
>
> The attached
> - adds a synopsis
> - adds a bit more description to the parameters similar to those in
> date_trunc
> - documents that negative intervals are treated the same as positive ones
>
> Note on the last point: This just falls out of the math, so was not
> deliberate, but it seems fine to me. We could ban negative intervals, but
> that would possibly just inconvenience some people unnecessarily. We could
> also treat negative strides differently somehow, but I don't immediately
> see a useful and/or intuitive change in behavior to come of that.
>
> --
> John Naylor
> EDB: http://www.enterprisedb.com
>

Re: truncating timestamps on arbitrary intervals

2021-03-30 Thread John Naylor

On Sat, Mar 27, 2021 at 1:06 PM Justin Pryzby  wrote:
>
> The current docs seem to be missing a "synopsis", like
>
> +
> +date_trunc(stride,
timestamp, origin)
> +

The attached
- adds a synopsis
- adds a bit more description to the parameters similar to those in
date_trunc
- documents that negative intervals are treated the same as positive ones

Note on the last point: This just falls out of the math, so was not
deliberate, but it seems fine to me. We could ban negative intervals, but
that would possibly just inconvenience some people unnecessarily. We could
also treat negative strides differently somehow, but I don't immediately
see a useful and/or intuitive change in behavior to come of that.

--
John Naylor
EDB: http://www.enterprisedb.com

synopsis-and-add-to-description.patch
Description: Binary data

Re: truncating timestamps on arbitrary intervals

2021-03-30 Thread John Naylor

Currently, when the origin is after the input, the result is the timestamp
at the end of the bin, rather than the beginning as expected. The attached
puts the result consistently at the beginning of the bin.

--
John Naylor
EDB: http://www.enterprisedb.com


rationalize-future-origin.patch
Description: Binary data

Re: truncating timestamps on arbitrary intervals

2021-03-27 Thread Justin Pryzby

On Wed, Mar 24, 2021 at 08:50:59PM +0100, Peter Eisentraut wrote:
> On 24.03.21 18:58, John Naylor wrote:
> >  > As a potential follow-up, should we perhaps add named arguments?  That
> >  > might make the invocations easier to read, depending on taste.
> > 
> > I think it's quite possible some users will prefer that. All we need is
> > to add something like
> > 
> > proargnames => '{bin_width,input,origin}'
> > 
> > to the catalog, right?
> 
> right, plus some documentation adjustments perhaps

+1

The current docs seem to be missing a "synopsis", like

+
+date_trunc(stride, 
timestamp, origin)
+

-- 
Justin

Re: truncating timestamps on arbitrary intervals

2021-03-24 Thread Peter Eisentraut

On 24.03.21 18:58, John Naylor wrote:

 > As a potential follow-up, should we perhaps add named arguments?  That
 > might make the invocations easier to read, depending on taste.

I think it's quite possible some users will prefer that. All we need is 
to add something like

proargnames => '{bin_width,input,origin}'

to the catalog, right?

right, plus some documentation adjustments perhaps

Also, I noticed that I put in double semicolons in the new functions 
somehow. I'll fix that as well.

I have fixed that.

Re: truncating timestamps on arbitrary intervals

2021-03-24 Thread Peter Eisentraut


On 24.03.21 18:25, Erik Rijkers wrote:

On 2021.03.24. 16:38 Peter Eisentraut  wrote:




Committed.



'In cases full units' seems strange.


fixed, thanks

Re: truncating timestamps on arbitrary intervals

2021-03-24 Thread John Naylor

On Wed, Mar 24, 2021 at 1:25 PM Erik Rijkers  wrote:
>
> 'In cases full units' seems strange.
>
> Not a native speaker but maybe the attached changes are improvements?

-In cases full units (1 minute, 1 hour, etc.), it gives the same result
as
+In case of full units (1 minute, 1 hour, etc.), it gives the same
result as
 the analogous date_trunc call, but the difference
is
 that date_bin can truncate to an arbitrary
interval.


I would say "In the case of"


-The stride interval cannot contain units of
month
+The stride interval cannot contain units of a
month
 or larger.

The original seems fine to me here.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

2021-03-24 Thread John Naylor

On Wed, Mar 24, 2021 at 11:38 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> Committed.
>
> I noticed that some of the documentation disappeared between v9 and v10.
>   So I put that back and updated it appropriately.  I also added a few
> more test cases to cover some things that have been discussed during the
> course of this thread.

Thanks! I put off updating the documentation in case the latest approach
was not feature-rich enough.

> As a potential follow-up, should we perhaps add named arguments?  That
> might make the invocations easier to read, depending on taste.

I think it's quite possible some users will prefer that. All we need is to
add something like

proargnames => '{bin_width,input,origin}'

to the catalog, right?

Also, I noticed that I put in double semicolons in the new functions
somehow. I'll fix that as well.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

2021-03-24 Thread Erik Rijkers

> On 2021.03.24. 16:38 Peter Eisentraut  
> wrote:

> 
> Committed.
> 

'In cases full units' seems strange.

Not a native speaker but maybe the attached changes are improvements?


Erik Rijkers--- ./doc/src/sgml/func.sgml.orig	2021-03-24 18:16:01.269515354 +0100
+++ ./doc/src/sgml/func.sgml	2021-03-24 18:18:31.695819520 +0100
@@ -9907,13 +9907,13 @@

 

-In cases full units (1 minute, 1 hour, etc.), it gives the same result as
+In case of full units (1 minute, 1 hour, etc.), it gives the same result as
 the analogous date_trunc call, but the difference is
 that date_bin can truncate to an arbitrary interval.

 

-The stride interval cannot contain units of month
+The stride interval cannot contain units of a month
 or larger.

Re: truncating timestamps on arbitrary intervals

2021-03-24 Thread Peter Eisentraut

On 18.01.21 21:54, John Naylor wrote:
On Mon, Nov 23, 2020 at 1:44 PM John Naylor 
mailto:john.nay...@enterprisedb.com>> wrote:

 >
 > On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut 
> wrote:

 > > - After reading the discussion a few times, I'm not so sure anymore
 > > whether making this a cousin of date_trunc is the right way to go.  As
 > > you mentioned, there are some behaviors specific to date_trunc that
 > > don't really make sense in date_trunc_interval, and maybe we'll have
 > > more of those.

For v10, I simplified the behavior by decoupling the behavior from 
date_trunc() and putting in some restrictions as discussed earlier. It's 
much simpler now. It could be argued that it goes too far in that 
direction, but it's easy to reason about and we can put back some 
features as we see fit.

Committed.

I noticed that some of the documentation disappeared between v9 and v10. 
 So I put that back and updated it appropriately.  I also added a few 
more test cases to cover some things that have been discussed during the 
course of this thread.

As a potential follow-up, should we perhaps add named arguments?  That 
might make the invocations easier to read, depending on taste.

Re: truncating timestamps on arbitrary intervals

2021-03-19 Thread David Steele

On 1/18/21 3:54 PM, John Naylor wrote:
On Mon, Nov 23, 2020 at 1:44 PM John Naylor 
mailto:john.nay...@enterprisedb.com>> wrote:

 >
 > On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut 
> wrote:

 > > - After reading the discussion a few times, I'm not so sure anymore
 > > whether making this a cousin of date_trunc is the right way to go.  As
 > > you mentioned, there are some behaviors specific to date_trunc that
 > > don't really make sense in date_trunc_interval, and maybe we'll have
 > > more of those.

For v10, I simplified the behavior by decoupling the behavior from 
date_trunc() and putting in some restrictions as discussed earlier. It's 
much simpler now. It could be argued that it goes too far in that 
direction, but it's easy to reason about and we can put back some 
features as we see fit.

Peter, thoughts on the new patch?

Regards,
--
-David
da...@pgmasters.net

Re: truncating timestamps on arbitrary intervals

2021-01-18 Thread John Naylor

On Mon, Nov 23, 2020 at 1:44 PM John Naylor 
wrote:
>
> On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:
> > - After reading the discussion a few times, I'm not so sure anymore
> > whether making this a cousin of date_trunc is the right way to go.  As
> > you mentioned, there are some behaviors specific to date_trunc that
> > don't really make sense in date_trunc_interval, and maybe we'll have
> > more of those.

For v10, I simplified the behavior by decoupling the behavior from
date_trunc() and putting in some restrictions as discussed earlier. It's
much simpler now. It could be argued that it goes too far in that
direction, but it's easy to reason about and we can put back some features
as we see fit.

> > Also, date_trunc_interval isn't exactly a handy name.
> > Maybe something to think about.  It's obviously fairly straightforward
> > to change it.
>
> Effectively, it puts timestamps into bins, so maybe date_bin() or
something like that?

For v10 I went with date_bin() so we can see how that looks.

> > - There were various issues with the stride interval having months and
> > years.  I'm not sure we even need that.  It could be omitted unless you
> > are confident that your implementation is now sufficient.
>
> Months and years were a bit tricky, so I'd be happy to leave that out if
there is not much demand for it. date_trunc() already has quarters,
decades, centuries, and millenia.

I removed months and years for this version, but that can be reconsidered
of course. The logic is really simple now.

> > - Also, negative intervals could be prohibited, but I suppose that
> > matters less.

I didn't go this far, but probably should before long.

> > - Then again, I'm thinking that maybe we should make the origin
> > mandatory.  Otherwise, the default answers when having strides larger
> > than a day are entirely arbitrary, e.g.,

I've tried this and like the resulting simplification.

> > - I'm wondering whether we need the date_trunc_interval(interval,
> > timestamptz, timezone) variant.  Isn't that the same as
> > date_trunc_interval(foo AT ZONE xyz, value)?
>
> I based this on 600b04d6b5ef6 for date_trunc(), whose message states:
>
> date_trunc(field, timestamptz, zone_name)
>
> is the same as
>
> date_trunc(field, timestamptz at time zone zone_name) at time zone
zone_name
>
> so without the shorthand, you need to specify the timezone twice, once
for the calculation, and once for the output.

In light of making the origin mandatory, it no longer makes sense to have a
time zone parameter, since we can specify the time zone on the origin; and
if desired on the output as well.

--
John Naylor
EDB: http://www.enterprisedb.com


v10-datetrunc-interval.patch
Description: Binary data

Re: truncating timestamps on arbitrary intervals

2020-11-23 Thread John Naylor

On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:
>
> On 2020-06-30 06:34, John Naylor wrote:
> > In v9, I've simplified the patch somewhat to make it easier for future
> > work to build on.
> >
> > - When truncating on month-or-greater intervals, require the origin to
> > align on month. This removes the need to handle weird corner cases
> > that have no straightforward behavior.
> > - Remove hackish and possibly broken code to allow origin to be after
> > the input timestamp. The default origin is Jan 1, 1 AD, so only AD
> > dates will behave correctly by default. This is not enforced for now,
> > since it may be desirable to find a way to get this to work in a nicer
> > way.
> > - Rebase docs over PG13 formatting changes.
>
> This looks pretty solid now.  Are there any more corner cases and other
> areas with unclear behavior that you are aware of?

Hi Peter,

Thanks for taking a look!

I believe there are no known corner cases aside from not throwing an error
if origin > input, but I'll revisit that when we are more firm on what
features we want support.

> A couple of thoughts:
>
> - After reading the discussion a few times, I'm not so sure anymore
> whether making this a cousin of date_trunc is the right way to go.  As
> you mentioned, there are some behaviors specific to date_trunc that
> don't really make sense in date_trunc_interval, and maybe we'll have
> more of those.

As far as the behaviors, I'm not sure exactly what you what you were
thinking of, but here are two issues off the top of my head:

- If the new functions are considered variants of date_trunc(), there is
the expectation that the options work the same way, in particular the
timezone parameter. You asked specifically about that below, so I'll
address that separately.
- In the "week" case, the boundary position depends on the origin, since a
week-long interval is just 7 days.

> Also, date_trunc_interval isn't exactly a handy name.
> Maybe something to think about.  It's obviously fairly straightforward
> to change it.

Effectively, it puts timestamps into bins, so maybe date_bin() or something
like that?

> - There were various issues with the stride interval having months and
> years.  I'm not sure we even need that.  It could be omitted unless you
> are confident that your implementation is now sufficient.

Months and years were a bit tricky, so I'd be happy to leave that out if
there is not much demand for it. date_trunc() already has quarters,
decades, centuries, and millenia.

> - Also, negative intervals could be prohibited, but I suppose that
> matters less.

Good for the sake of completeness. I think they happen to work in v9 by
accident, but it would be better not to expose that.

> - I'm curious about the origin being set to 0001-01-01.  This seems to
> work correctly in that it sets the origin to a Monday, which is what we
> wanted, but according to Google that day was a Saturday.  Something to
> do with Julian vs. Gregorian calendar?

Right, working backwards from our calendar today, it's Monday, but at the
time it would theoretically be Saturday, barring leap year miscalculations.

> Maybe we should choose a date
> that is a bit more recent and easier to reason with.

2001-01-01 would also be a Monday aligned with centuries and millenia, so
that would be my next suggestion. If we don't care to match with
date_trunc() on those larger units, we could also use 1900-01-01, so the
vast majority of dates in databases are after the origin.

> - Then again, I'm thinking that maybe we should make the origin
> mandatory.  Otherwise, the default answers when having strides larger
> than a day are entirely arbitrary, e.g.,
>
> => select date_trunc_interval('10 year', '0196-05-20 BC'::timestamp);
> 0190-01-01 00:00:00 BC
>
> => select date_trunc_interval('10 year', '0196-05-20 AD'::timestamp);
> 0191-01-01 00:00:00

Right. In the first case, the default origin is also after the input, and
crosses the AD/BC boundary. Tricky to get right.

> Perhaps the origin could be defaulted if the interval is less than a day
> or something like that.

If we didn't allow months and years to be units, it seems the default would
always make sense?

> - I'm wondering whether we need the date_trunc_interval(interval,
> timestamptz, timezone) variant.  Isn't that the same as
> date_trunc_interval(foo AT ZONE xyz, value)?

I based this on 600b04d6b5ef6 for date_trunc(), whose message states:

date_trunc(field, timestamptz, zone_name)

is the same as

date_trunc(field, timestamptz at time zone zone_name) at time zone zone_name

so without the shorthand, you need to specify the timezone twice, once for
the calculation, and once for the output.

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: truncating timestamps on arbitrary intervals

2020-11-12 Thread Peter Eisentraut


On 2020-06-30 06:34, John Naylor wrote:

In v9, I've simplified the patch somewhat to make it easier for future
work to build on.

- When truncating on month-or-greater intervals, require the origin to
align on month. This removes the need to handle weird corner cases
that have no straightforward behavior.
- Remove hackish and possibly broken code to allow origin to be after
the input timestamp. The default origin is Jan 1, 1 AD, so only AD
dates will behave correctly by default. This is not enforced for now,
since it may be desirable to find a way to get this to work in a nicer
way.
- Rebase docs over PG13 formatting changes.


This looks pretty solid now.  Are there any more corner cases and other 
areas with unclear behavior that you are aware of?


A couple of thoughts:

- After reading the discussion a few times, I'm not so sure anymore 
whether making this a cousin of date_trunc is the right way to go.  As 
you mentioned, there are some behaviors specific to date_trunc that 
don't really make sense in date_trunc_interval, and maybe we'll have 
more of those.  Also, date_trunc_interval isn't exactly a handy name. 
Maybe something to think about.  It's obviously fairly straightforward 
to change it.


- There were various issues with the stride interval having months and 
years.  I'm not sure we even need that.  It could be omitted unless you 
are confident that your implementation is now sufficient.


- Also, negative intervals could be prohibited, but I suppose that 
matters less.


- I'm curious about the origin being set to 0001-01-01.  This seems to 
work correctly in that it sets the origin to a Monday, which is what we 
wanted, but according to Google that day was a Saturday.  Something to 
do with Julian vs. Gregorian calendar?  Maybe we should choose a date 
that is a bit more recent and easier to reason with.


- Then again, I'm thinking that maybe we should make the origin 
mandatory.  Otherwise, the default answers when having strides larger 
than a day are entirely arbitrary, e.g.,


=> select date_trunc_interval('10 year', '0196-05-20 BC'::timestamp);
0190-01-01 00:00:00 BC

=> select date_trunc_interval('10 year', '0196-05-20 AD'::timestamp);
0191-01-01 00:00:00

Perhaps the origin could be defaulted if the interval is less than a day 
or something like that.


- I'm wondering whether we need the date_trunc_interval(interval, 
timestamptz, timezone) variant.  Isn't that the same as 
date_trunc_interval(foo AT ZONE xyz, value)?

Re: truncating timestamps on arbitrary intervals

2020-06-29 Thread John Naylor

In v9, I've simplified the patch somewhat to make it easier for future
work to build on.

- When truncating on month-or-greater intervals, require the origin to
align on month. This removes the need to handle weird corner cases
that have no straightforward behavior.
- Remove hackish and possibly broken code to allow origin to be after
the input timestamp. The default origin is Jan 1, 1 AD, so only AD
dates will behave correctly by default. This is not enforced for now,
since it may be desirable to find a way to get this to work in a nicer
way.
- Rebase docs over PG13 formatting changes.



--
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


v9-datetrunc-interval.patch
Description: Binary data

Re: truncating timestamps on arbitrary intervals

2020-04-02 Thread John Naylor

On Tue, Mar 31, 2020 at 4:34 PM Artur Zakirov  wrote:
> Thank you for new version of the patch.

Thanks for taking a look! Attached is v8, which addresses your points,
adds tests and fixes some bugs. There are still some WIP detritus in
the timezone code, so I'm not claiming it's ready, but it's much
closer. I'm fairly confident in the implementation of timestamp
without time zone, however.

> I'm not sure that I fully understand the 'origin' parameter. Is it valid
> to have a value of 'origin' which is greater than a value of 'timestamp'
> parameter?

That is the intention. The returned values should be

origin +/- (n * interval)

where n is an integer.

> I get some different results in such case:
>
> =# select date_trunc_interval('2 year', timestamp '2020-01-16 20:38:40',
> timestamp '2022-01-17 00:00:00');
>   date_trunc_interval
> -
>   2020-01-01 00:00:00

This was correct per how I coded it, but I have rethought where to
draw the bins for user-specified origins. I have decided that the
above is inconsistent with units smaller than a month. We shouldn't
"cross" the bin until the input has reached Jan. 17, in this case. In
v8, the answer to the above is

 date_trunc_interval
-
 2018-01-17 00:00:00
(1 row)

(This could probably be better documented)

> =# select date_trunc_interval('3 year', timestamp '2020-01-16 20:38:40',
 timestamp '2022-01-17 00:00:00');
>   date_trunc_interval
> -
>   2022-01-01 00:00:00
>
> So here I'm not sure which result is correct.

This one is just plain broken. The result should always be equal or
earlier than the input. In v8 the result is now:

 date_trunc_interval
-
 2019-01-17 00:00:00
(1 row)

> It seems that the patch is still in progress, but I have some nitpicking.
>
> > +
> > date_trunc_interval(interval, 
> > timestamptz, text)
> > +timestamptz  
>
> It seems that 'timestamptz' in both argument and result descriptions
> should be replaced by 'timestamp with time zone' (see other functions
> descriptions). Though it is okay to use 'timestamptz' in SQL examples.

Any and all nitpicks welcome! I have made these match the existing
date_trunc documentation more closely.

> timestamp_trunc_interval_internal() and
> timestamptz_trunc_interval_internal() have similar code. I think they
> can be rewritten to avoid code duplication.

I thought so too (and noticed the same about the existing date_trunc),
but it's more difficult than it looks.

Note: I copied some tests from timestamp to timestamptz with a few
tweaks. A few tz tests still don't pass. I'm not yet sure if the
problem is in the test, or my code. Some detailed review of the tests
and their results would be helpful.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v8-datetrunc_interval.patch
Description: Binary data

Re: truncating timestamps on arbitrary intervals

2020-03-31 Thread Artur Zakirov


On 3/30/2020 9:30 PM, John Naylor wrote:

I attempted this in the attached v7. There are 4 new functions for
truncating timestamptz on an interval -- with and without origin, and
with and without time zone.


Thank you for new version of the patch.

I'm not sure that I fully understand the 'origin' parameter. Is it valid 
to have a value of 'origin' which is greater than a value of 'timestamp' 
parameter?


I get some different results in such case:

=# select date_trunc_interval('2 year', timestamp '2020-01-16 20:38:40', 
timestamp '2022-01-17 00:00:00');

 date_trunc_interval
-
 2020-01-01 00:00:00

=# select date_trunc_interval('3 year', timestamp '2020-01-16 20:38:40', 
timestamp '2022-01-17 00:00:00');

 date_trunc_interval
-
 2022-01-01 00:00:00

So here I'm not sure which result is correct.

It seems that the patch is still in progress, but I have some nitpicking.


+date_trunc_interval(interval, timestamptz, 
text)
+timestamptz  


It seems that 'timestamptz' in both argument and result descriptions 
should be replaced by 'timestamp with time zone' (see other functions 
descriptions). Though it is okay to use 'timestamptz' in SQL examples.


timestamp_trunc_interval_internal() and 
timestamptz_trunc_interval_internal() have similar code. I think they 
can be rewritten to avoid code duplication.


--
Artur

Re: truncating timestamps on arbitrary intervals

2020-03-30 Thread John Naylor

I wrote:

> I'm going to look into implementing timezone while awaiting comments on v6.

I attempted this in the attached v7. There are 4 new functions for
truncating timestamptz on an interval -- with and without origin, and
with and without time zone.

Parts of it are hackish, and need more work, but I think it's in
passable enough shape to get feedback on. The origin parameter logic
was designed with timestamps-without-time-zone in mind, and
retrofitting time zone on top of that was a bit messy. There might be
bugs.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


v7-datetrunc_interval.patch
Description: Binary data

Re: truncating timestamps on arbitrary intervals

2020-03-25 Thread John Naylor

On Tue, Mar 24, 2020 at 9:34 PM Tels  wrote:
>
> Hello John,
>
> this looks like a nice feature. I'm wondering how it relates to the
> following use-case:
>
> When drawing charts, the user can select pre-defined widths on times
> (like "15 min", "1 hour").
>
> The data for these slots is fitted always to intervalls that start in 0
> in the slot, e.g. if the user selects "15 min", the interval always
> starts at xx:00, xx:15, xx:30 or xx:45. This is to aid caching of the
> resulting data, and so that requesting the same chart at xx:00 and xx:01
> actually draws the same chart, and not some bar with only one minute
> data at at the end.

Hi Tels, thanks for your interest! Sounds like the exact use case I had in mind.

> It is of course easy for things like "1 hour", but for yearly I had to
> come up with things like:
>
>EXTRAC(YEAR FROM thetime) * 12 + EXTRACT(MONTH FROM thetime)
>
> and its gets even more involved for weeks, weekdays or odd things like
> fortnights.

To be clear, this particular case was already handled by the existing
date_trunc, but only single units and a few other special cases. I
understand if you have to write code to handle 15 minutes, you need to
use that structure for all cases.

Fortnight would be trivial:

date_trunc_interval('14 days'::interval, thetime [, optional start of
the fortnight])

...but weekdays would still need something extra.

> So would that mean one could replace this by:
>
>   date_trunc_interval('3600 seconds'::interval, thetime)
>
> and it would be easier, and (hopefully) faster?

Certainly easier and more flexible. It's hard to make guesses about
performance, but with your example above where you have two SQL
function calls plus some expression evaluation, I think a single
function would be faster in many cases.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: truncating timestamps on arbitrary intervals

2020-03-24 Thread Tels


Hello John,

this looks like a nice feature. I'm wondering how it relates to the 
following use-case:


When drawing charts, the user can select pre-defined widths on times 
(like "15 min", "1 hour").


The data for these slots is fitted always to intervalls that start in 0 
in the slot, e.g. if the user selects "15 min", the interval always 
starts at xx:00, xx:15, xx:30 or xx:45. This is to aid caching of the 
resulting data, and so that requesting the same chart at xx:00 and xx:01 
actually draws the same chart, and not some bar with only one minute 
data at at the end.


In PSQL, this works out to using this as GROUP BY and then summing up 
all values:


  SELECT FLOOR(EXTRACT(EPOCH from thetime) / 3600) * 3600, SUM(events) 
FROM mytable ... GROUP BY 1;


whereas here 3600 means "hourly".

It is of course easy for things like "1 hour", but for yearly I had to 
come up with things like:


  EXTRAC(YEAR FROM thetime) * 12 + EXTRACT(MONTH FROM thetime)

and its gets even more involved for weeks, weekdays or odd things like 
fortnights.


So would that mean one could replace this by:

 date_trunc_interval('3600 seconds'::interval, thetime)

and it would be easier, and (hopefully) faster?

Best regards,

Tels

Re: truncating timestamps on arbitrary intervals

2020-03-24 Thread John Naylor

On Sun, Mar 15, 2020 at 2:26 PM I wrote:
>
> To get more logical behavior, perhaps the optional parameter is better
> as an offset instead of an origin. Alternatively (or additionally),
> the function could do the math on int64 timestamps directly.

For v6, I changed the algorithm to use pg_tm for months and years, and
int64 for all smaller units. Despite the split, I think it's easier to
read now, and certainly shorter. This has the advantage that now
mixing units is allowed, as long as you don't mix months/years with
days or smaller, which often doesn't make sense and is not very
practical. (not yet documented) One consequence of this is that when
operating on months/years, and the origin contains smaller units, the
smaller units are ignored. Example:

select date_trunc_interval('12 months'::interval, timestamp
'2012-03-01 01:21:01', timestamp '2011-03-22');
 date_trunc_interval
-
 2012-03-01 00:00:00
(1 row)

Even though not quite a full year has passed, it ignores the days in
the origin time and detects a difference in 12 calendar months. That
might be fine, although we could also throw an error and say origins
must be in the form of '-01-01 00:00:00' when truncating on months
and/or years.

I added a sketch of documentation for the origin parameter and more tests.

On Fri, Mar 13, 2020 at 7:48 PM Isaac Morland  wrote:
> I'm confused by this. If my calendars are correct, both 1900-01-02 and 
> 2020-02-11 are Tuesdays. So if the date being adjusted and the origin are 
> both Tuesday, shouldn't the day part be left alone when truncating to 7 days? 
> Also, I'd like to confirm that the default starting point for 7 day periods 
> (weeks) is Monday, per ISO.

This is fixed.

select date_trunc_interval('7 days'::interval, timestamp '2020-02-11
01:01:01.0', TIMESTAMP '1900-01-02');
 date_trunc_interval
-
 2020-02-11 00:00:00
(1 row)

select date_trunc_interval('7 days'::interval, d), count(*) from
generate_series( '2020-02-01'::timestamp, '2020-03-31', '1 day') d
group by 1 order by 1;
 date_trunc_interval | count
-+---
 2020-01-27 00:00:00 | 2
 2020-02-03 00:00:00 | 7
 2020-02-10 00:00:00 | 7
 2020-02-17 00:00:00 | 7
 2020-02-24 00:00:00 | 7
 2020-03-02 00:00:00 | 7
 2020-03-09 00:00:00 | 7
 2020-03-16 00:00:00 | 7
 2020-03-23 00:00:00 | 7
 2020-03-30 00:00:00 | 2
(10 rows)

> Perhaps the starting point for dates should be either 0001-01-01 (the 
> proleptic beginning of the CE calendar) or 2001-01-01 (the beginning of the 
> current 400-year repeating cycle of leap years and weeks, and a Monday, 
> giving the appropriate ISO result for truncating to 7 day periods).

I went ahead with 2001-01-01 for the time being.

On Thu, Mar 19, 2020 at 4:20 PM Artur Zakirov  wrote:
>
> =# select date_trunc_interval('1.1 year'::interval, TIMESTAMP
> '2020-02-01 01:21:01');
> ERROR:  only one interval unit allowed for truncation

For any lingering cases like this (i don't see any), maybe an error
hint is in order. The following works now, as expected for 1 year 1
month:

select date_trunc_interval('1.1 year'::interval, timestamp '2002-05-01
01:21:01');
 date_trunc_interval
-
 2002-02-01 00:00:0

I'm going to look into implementing timezone while awaiting comments on v6.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v6-datetrunc_interval.patch
Description: Binary data

Re: truncating timestamps on arbitrary intervals

2020-03-19 Thread Artur Zakirov


Hello,

On 3/13/2020 4:13 PM, John Naylor wrote:

I've put off adding documentation on the origin piece pending comments
about the approach.

I haven't thought seriously about timezone yet, but hopefully it's
just work and nothing to think too hard about.


Thank you for the patch. I looked it and tested a bit.

There is one interesting case which might be mentioned in the 
documentation or in the tests is the following. The function has 
interesting behaviour with real numbers:


=# select date_trunc_interval('0.1 year'::interval, TIMESTAMP 
'2020-02-01 01:21:01');

 date_trunc_interval
-
 2020-02-01 00:00:00

=# select date_trunc_interval('1.1 year'::interval, TIMESTAMP 
'2020-02-01 01:21:01');

ERROR:  only one interval unit allowed for truncation

It is because the second interval has two interval units:

=# select '0.1 year'::interval;
 interval
--
 1 mon

=# select '1.1 year'::interval;
   interval
--
 1 year 1 mon

--
Artur

Re: truncating timestamps on arbitrary intervals

2020-03-15 Thread John Naylor

On Fri, Mar 13, 2020 at 7:48 PM Isaac Morland  wrote:
>
> On Fri, 13 Mar 2020 at 03:13, John Naylor  wrote:
>
>> - align weeks to start on Sunday
>> select date_trunc_interval('7 days'::interval, TIMESTAMP '2020-02-11
>> 01:01:01.0', TIMESTAMP '1900-01-02');
>>  date_trunc_interval
>> -
>>  2020-02-09 00:00:00
>> (1 row)
>
>
> I'm confused by this. If my calendars are correct, both 1900-01-02 and 
> 2020-02-11 are Tuesdays. So if the date being adjusted and the origin are 
> both Tuesday, shouldn't the day part be left alone when truncating to 7 days?

Thanks for taking a look! The non-intuitive behavior you found is
because the patch shifts the timestamp before converting to the
internal pg_tm type. The pg_tm type stores day of the month, which is
used for the calculation. It's not counting the days since the origin.
Then the result is shifted back.

To get more logical behavior, perhaps the optional parameter is better
as an offset instead of an origin. Alternatively (or additionally),
the function could do the math on int64 timestamps directly.

> Also, I'd like to confirm that the default starting point for 7 day periods 
> (weeks) is Monday, per ISO.

That's currently the behavior in the existing date_trunc function,
when passed the string 'week'. Given that keyword, it calculates the
week of the year.

When using the proposed function with arbitrary intervals, it uses day
of the month, as found in the pg_tm struct. It doesn't treat 7 days
differently then 5 or 10 without user input (origin or offset), since
there is nothing special about 7 day intervals as such internally. To
show the difference between date_trunc, and date_trunc_interval as
implemented in v5 with no origin:

select date_trunc('week', d), count(*) from generate_series(
'2020-02-01'::timestamp, '2020-03-31', '1 day') d group by 1 order by
1;
 date_trunc  | count
-+---
 2020-01-27 00:00:00 | 2
 2020-02-03 00:00:00 | 7
 2020-02-10 00:00:00 | 7
 2020-02-17 00:00:00 | 7
 2020-02-24 00:00:00 | 7
 2020-03-02 00:00:00 | 7
 2020-03-09 00:00:00 | 7
 2020-03-16 00:00:00 | 7
 2020-03-23 00:00:00 | 7
 2020-03-30 00:00:00 | 2
(10 rows)

select date_trunc_interval('7 days'::interval, d), count(*) from
generate_series( '2020-02-01'::timestamp, '2020-03-31', '1 day') d
group by 1 order by 1;
 date_trunc_interval | count
-+---
 2020-02-01 00:00:00 | 7
 2020-02-08 00:00:00 | 7
 2020-02-15 00:00:00 | 7
 2020-02-22 00:00:00 | 7
 2020-02-29 00:00:00 | 1
 2020-03-01 00:00:00 | 7
 2020-03-08 00:00:00 | 7
 2020-03-15 00:00:00 | 7
 2020-03-22 00:00:00 | 7
 2020-03-29 00:00:00 | 3
(10 rows)

Resetting the day every month is counterintuitive if not broken, and
as I mentioned it might make more sense to use the int64 timestamp
directly, at least for intervals less than one month. I'll go look
into doing that.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: truncating timestamps on arbitrary intervals

2020-03-13 Thread Isaac Morland

On Fri, 13 Mar 2020 at 03:13, John Naylor 
wrote:

> On Wed, Feb 26, 2020 at 11:36 PM Tom Lane  wrote:
> >
> > * In general, binning involves both an origin and a stride.  When
> > working with plain numbers it's almost always OK to set the origin
> > to zero, but it's less clear to me whether that's all right for
> > timestamps.  Do we need another optional argument?  Even if we
> > don't, "zero" for tm_year is 1900, which is going to give results
> > that surprise somebody.
>

- align weeks to start on Sunday
> select date_trunc_interval('7 days'::interval, TIMESTAMP '2020-02-11
> 01:01:01.0', TIMESTAMP '1900-01-02');
>  date_trunc_interval
> -
>  2020-02-09 00:00:00
> (1 row)
>

I'm confused by this. If my calendars are correct, both 1900-01-02
and 2020-02-11 are Tuesdays. So if the date being adjusted and the origin
are both Tuesday, shouldn't the day part be left alone when truncating to 7
days? Also, I'd like to confirm that the default starting point for 7 day
periods (weeks) is Monday, per ISO. I know it's very fashionable in North
America to split the weekend in half but it's not the international
standard.

Perhaps the starting point for dates should be either 0001-01-01 (the
proleptic beginning of the CE calendar) or 2001-01-01 (the beginning of the
current 400-year repeating cycle of leap years and weeks, and a Monday,
giving the appropriate ISO result for truncating to 7 day periods).

Re: truncating timestamps on arbitrary intervals

2020-03-13 Thread John Naylor

On Wed, Feb 26, 2020 at 11:36 PM Tom Lane  wrote:
>
> * In general, binning involves both an origin and a stride.  When
> working with plain numbers it's almost always OK to set the origin
> to zero, but it's less clear to me whether that's all right for
> timestamps.  Do we need another optional argument?  Even if we
> don't, "zero" for tm_year is 1900, which is going to give results
> that surprise somebody.

I tried the simplest way in the attached v5. Examples (third param is origin):

-- same result as no origin:
select date_trunc_interval('5 min'::interval, TIMESTAMP '2020-02-01
01:01:01', TIMESTAMP '2020-02-01');
 date_trunc_interval
-
 2020-02-01 01:00:00
(1 row)

-- shift bins by 2.5 min:
select date_trunc_interval('5 min'::interval, TIMESTAMP '2020-02-1
01:01:01', TIMESTAMP '2020-02-01 00:02:30');
 date_trunc_interval
-
 2020-02-01 00:57:30
(1 row)

-- align weeks to start on Sunday
select date_trunc_interval('7 days'::interval, TIMESTAMP '2020-02-11
01:01:01.0', TIMESTAMP '1900-01-02');
 date_trunc_interval
-
 2020-02-09 00:00:00
(1 row)

I've put off adding documentation on the origin piece pending comments
about the approach.

I haven't thought seriously about timezone yet, but hopefully it's
just work and nothing to think too hard about.


--
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


v5-datetrunc_interval.patch
Description: Binary data

Re: truncating timestamps on arbitrary intervals

2020-02-28 Thread John Naylor

On Wed, Feb 26, 2020 at 11:36 PM Tom Lane  wrote:
>
> John Naylor  writes:
> > [ v3-datetrunc_interval.patch ]
>
> A few thoughts:
>
> * In general, binning involves both an origin and a stride.  When
> working with plain numbers it's almost always OK to set the origin
> to zero, but it's less clear to me whether that's all right for
> timestamps.  Do we need another optional argument?  Even if we
> don't, "zero" for tm_year is 1900, which is going to give results
> that surprise somebody.

Not sure.

A surprise I foresee in general might be: '1 week' is just '7 days',
and not aligned on "WOY". Since the function is passed an interval and
not text, we can't raise a warning. But date_trunc() already covers
that, so probably not a big deal.

> * I'm still not convinced that the code does the right thing for
> 1-based months or days.  Shouldn't you need to subtract 1, then
> do the modulus, then add back 1?

Yes, brain fade on my part. Fixed in the attached v4.

> * Speaking of modulus, would it be clearer to express the
> calculations like
> timestamp -= timestamp % interval;
> (That's just a question, I'm not sure.)

Seems nicer, so done that way.

> * Code doesn't look to have thought carefully about what to do with
> negative intervals, or BC timestamps.

By accident, negative intervals currently behave the same as positive
ones. We could make negative intervals round up rather than truncate
(or vice versa). I don't know the best thing to do here.

As for BC, changed so it goes in the correct direction at least, and added test.

> * The comment
>  * Justify all lower timestamp units and throw an error if any
>  * of the lower interval units are non-zero.
> doesn't seem to have a lot to do with what the code after it actually
> does.  Also, you need explicit /* FALLTHRU */-type comments in that
> switch, or pickier buildfarm members will complain.

Stale comment from an earlier version, fixed. Not sure if "justify" is
the right term, but "zero" is a bit misleading. Added fall thru's.

> * Seems like you could jam all the unit-related error checking into
> that switch's default: case, where it will cost nothing if the
> call is valid:
>
> switch (unit)
> {
> ...
> default:
> if (unit == 0)
> // complain about zero interval
> else
> // complain about interval with multiple components
> }

Done.

> * I'd use ERRCODE_INVALID_PARAMETER_VALUE for any case of disallowed
> contents of the interval.

Done.

Also removed the millisecond case, since it's impossible, or at least
not worth it, to distinguish from the microsecond case.

Note: I haven't done any additional docs changes in v4.

TODO: with timezone

possible TODO: origin parameter

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v4-datetrunc_interval.patch
Description: Binary data

Re: truncating timestamps on arbitrary intervals

2020-02-26 Thread David Fetter

On Wed, Feb 26, 2020 at 06:38:57PM +0800, John Naylor wrote:
> On Wed, Feb 26, 2020 at 3:51 PM David Fetter  wrote:
> >
> > I believe the following should error out, but doesn't.
> >
> > # SELECT date_trunc_interval('1 year 1 ms', TIMESTAMP '2001-02-16 
> > 20:38:40');
> >  date_trunc_interval
> > ═
> >  2001-01-01 00:00:00
> > (1 row)
> 
> You're quite right. I forgot to add error checking for
> second-and-below units. I've added your example to the tests. (I
> neglected to mention in my first email that because I chose to convert
> the interval to the pg_tm struct (seemed easiest), it's not
> straightforward how to allow multiple unit types, and I imagine the
> use case is small, so I had it throw an error.)

I suspect that this could be sanely expanded to span some sets of
adjacent types in a future patch, e.g. year + month or hour + minute.

> > Please find attached an update that I believe fixes the bug I found in
> > a principled way.
> 
> Thanks for that! I made a couple adjustments and incorporated your fix
> into v3: While working on v1, I noticed the DTK_FOO macros already had
> an idiom for bitmasking (see utils/datetime.h),

Oops!  Sorry I missed that.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: truncating timestamps on arbitrary intervals

2020-02-26 Thread Tom Lane

John Naylor  writes:
> [ v3-datetrunc_interval.patch ]

A few thoughts:

* In general, binning involves both an origin and a stride.  When
working with plain numbers it's almost always OK to set the origin
to zero, but it's less clear to me whether that's all right for
timestamps.  Do we need another optional argument?  Even if we
don't, "zero" for tm_year is 1900, which is going to give results
that surprise somebody.

* I'm still not convinced that the code does the right thing for
1-based months or days.  Shouldn't you need to subtract 1, then
do the modulus, then add back 1?

* Speaking of modulus, would it be clearer to express the
calculations like
timestamp -= timestamp % interval;
(That's just a question, I'm not sure.)

* Code doesn't look to have thought carefully about what to do with
negative intervals, or BC timestamps.

* The comment 
 * Justify all lower timestamp units and throw an error if any
 * of the lower interval units are non-zero.
doesn't seem to have a lot to do with what the code after it actually
does.  Also, you need explicit /* FALLTHRU */-type comments in that
switch, or pickier buildfarm members will complain.

* Seems like you could jam all the unit-related error checking into
that switch's default: case, where it will cost nothing if the
call is valid:

switch (unit)
{
...
default:
if (unit == 0)
// complain about zero interval
else
// complain about interval with multiple components
}

* I'd use ERRCODE_INVALID_PARAMETER_VALUE for any case of disallowed
contents of the interval.

regards, tom lane

Re: truncating timestamps on arbitrary intervals

2020-02-26 Thread John Naylor

On Wed, Feb 26, 2020 at 3:51 PM David Fetter  wrote:
>
> I believe the following should error out, but doesn't.
>
> # SELECT date_trunc_interval('1 year 1 ms', TIMESTAMP '2001-02-16 20:38:40');
>  date_trunc_interval
> ═
>  2001-01-01 00:00:00
> (1 row)

You're quite right. I forgot to add error checking for
second-and-below units. I've added your example to the tests. (I
neglected to mention in my first email that because I chose to convert
the interval to the pg_tm struct (seemed easiest), it's not
straightforward how to allow multiple unit types, and I imagine the
use case is small, so I had it throw an error.)

> Please find attached an update that I believe fixes the bug I found in
> a principled way.

Thanks for that! I made a couple adjustments and incorporated your fix
into v3: While working on v1, I noticed the DTK_FOO macros already had
an idiom for bitmasking (see utils/datetime.h), so I used that instead
of a bespoke enum. Also, since the bitmask is checked once, I removed
the individual member checks, allowing me to remove all the gotos.

There's another small wrinkle: Since we store microseconds internally,
it's neither convenient nor useful to try to error out for things like
'2 ms 500 us', since that is just as well written as '2500 us', and
stored exactly the same. I'm inclined to just skip the millisecond
check and just use microseconds, but I haven't done that yet.

Also, I noticed this bug in v1:

SELECT date_trunc_interval('17 days', TIMESTAMP '2001-02-16 20:38:40.123456');
 date_trunc_interval
-
 2001-01-31 00:00:00
(1 row)

This is another consequence of month and day being 1-based. Fixed,
with new tests.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v3-datetrunc_interval.patch
Description: Binary data

Re: truncating timestamps on arbitrary intervals

2020-02-25 Thread David Fetter

On Wed, Feb 26, 2020 at 10:50:19AM +0800, John Naylor wrote:
> Hi,
> 
> When analyzing time-series data, it's useful to be able to bin
> timestamps into equally spaced ranges. date_trunc() is only able to
> bin on a specified whole unit.

Thanks for adding this very handy feature!

> In the attached patch for the March
> commitfest, I propose a new function date_trunc_interval(), which can
> truncate to arbitrary intervals, e.g.:
> 
> select date_trunc_interval('15 minutes', timestamp '2020-02-16
> 20:48:40'); date_trunc_interval
> -
>  2020-02-16 20:45:00
> (1 row)

I believe the following should error out, but doesn't.

# SELECT date_trunc_interval('1 year 1 ms', TIMESTAMP '2001-02-16 20:38:40');
 date_trunc_interval 
═
 2001-01-01 00:00:00
(1 row)

> With this addition, it might be possible to turn the existing
> date_trunc() functions into wrappers. I haven't done that here because
> it didn't seem practical at this point. For one, the existing
> functions have special treatment for weeks, centuries, and millennia.

I agree that turning it into a wrapper would be separate work.

> Note: I've only written the implementation for the type timestamp
> without timezone. Adding timezone support would be pretty simple,
> but I wanted to get feedback on the basic idea first before making
> it complete. I've also written tests and very basic documentation.

Please find attached an update that I believe fixes the bug I found in
a principled way.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 5e36c4c888c65e358d2f87d84b64bc14d52f2b39 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Tue, 25 Feb 2020 23:49:35 -0800
Subject: [PATCH v2] Add date_trunc_interval(interval, timestamp)
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.24.1"

This is a multi-part message in MIME format.
--2.24.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


per John Naylor

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ceda48e0fc..3863c222a2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6949,6 +6949,15 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
 2 days 03:00:00

 
+   
+date_trunc_interval(interval, timestamp)
+timestamp
+Truncate to specified precision; see 
+
+date_trunc_interval('15 minutes', timestamp '2001-02-16 20:38:40')
+2001-02-16 20:30:00
+   
+

 
  
@@ -7818,7 +7827,7 @@ SELECT date_part('hour', INTERVAL '4 hours 3 minutes');
   
 
   
-   date_trunc
+   date_trunc, date_trunc_interval
 

 date_trunc
@@ -7902,6 +7911,21 @@ SELECT date_trunc('hour', INTERVAL '3 days 02:47:33');
 Result: 3 days 02:00:00
 

+
+   
+The function date_trunc_interval is 
+similar to the date_trunc, except that it
+truncates to an arbitrary interval.
+   
+
+   
+Example:
+
+SELECT date_trunc_interval('5 minutes', TIMESTAMP '2001-02-16 20:38:40');
+Result: 2001-02-16 20:35:00
+
+   
+
   
 
   
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 0b6c9d5ea8..ed742592af 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -30,6 +30,7 @@
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
 #include "parser/scansup.h"
+#include "port/pg_bitutils.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/datetime.h"
@@ -3804,6 +3805,144 @@ timestamptz_age(PG_FUNCTION_ARGS)
  *-*/
 
 
+/* timestamp_trunc_interval()
+ * Truncate timestamp to specified interval.
+ */
+Datum
+timestamp_trunc_interval(PG_FUNCTION_ARGS)
+{
+	Interval   *interval = PG_GETARG_INTERVAL_P(0);
+	Timestamp	timestamp = PG_GETARG_TIMESTAMP(1);
+	Timestamp	result;
+	fsec_t		ifsec,
+tfsec;
+	uint32_t	unit = 0,
+popcount = 0;
+	enum		TimeUnit {
+	us = 1 << 0,
+	ms = 1 << 1,
+	second = 1 << 2,
+	minute = 1 << 3,
+	hour   = 1 << 4,
+	day= 1 << 5,
+	month  = 1 << 6,
+	year   = 1 << 7
+};
+
+	struct pg_tm it;
+	struct pg_tm tt;
+
+	if (TIMESTAMP_NOT_FINITE(timestamp))
+		PG_RETURN_TIMESTAMP(timestamp);
+
+	if (interval2tm(*interval, , ) != 0)
+		elog(ERROR, "could not convert interval to tm");
+
+	if (timestamp2tm(timestamp, NULL, , , NULL, NULL) != 0)
+		ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("timestamp out of range")));
+
+	if (it.tm_year != 0)
+	{
+		tt.tm_year = it.tm_year * (tt.tm_year / it.tm_year);
+		unit |= year;
+	}
+	if (it.tm_mon != 0)
+	{
+		tt.tm_mon = it.tm_mon * (tt.tm_mon / it.tm_mon);
+		unit |= month;
+	}
+	if (it.tm_mday != 0)
+	{
+		tt.tm_mday = it.tm_mday * (tt.tm_mday / it.tm_mday);
+

truncating timestamps on arbitrary intervals

2020-02-25 Thread John Naylor

Hi,

When analyzing time-series data, it's useful to be able to bin
timestamps into equally spaced ranges. date_trunc() is only able to
bin on a specified whole unit. In the attached patch for the March
commitfest, I propose a new function date_trunc_interval(), which can
truncate to arbitrary intervals, e.g.:

select date_trunc_interval('15 minutes', timestamp '2020-02-16
20:48:40'); date_trunc_interval
-
 2020-02-16 20:45:00
(1 row)

With this addition, it might be possible to turn the existing
date_trunc() functions into wrappers. I haven't done that here because
it didn't seem practical at this point. For one, the existing
functions have special treatment for weeks, centuries, and millennia.

Note: I've only written the implementation for the type timestamp
without timezone. Adding timezone support would be pretty simple, but
I wanted to get feedback on the basic idea first before making it
complete. I've also written tests and very basic documentation.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


v1-datetrunc_interval.patch
Description: Binary data

47 matches

Mail list logo