Re: [HACKERS] Patch: Avoid precision error in to_timestamp().

2017-02-09 Thread Erik Nordström
Tom,

You are of course right that you cannot add precision that was not there to
begin with. My patch does nothing for input values that cannot be
represented accurately to begin with. However, that was not my main point.

The idea with the calculation is this: When you remove the integral/seconds
part of the value before converting to integral microseconds, you are
creating a number that can be represented with a float at higher accuracy
compared to the original value (i.e., simply because there are less digits
in the mantissa/significand). Thus when doing 0.312311172485352 *
USECS_PER_SEC, you suffer less precision-related errors with regards to
microseconds than when doing 14864803242.312311172485352 * USECS_PER_SEC as
in the original code. In other words, with this approach there are fewer
cases when 1 ULP is bigger than 1 microsecond.

FWIW, this is the output from latest postgres HEAD, which includes your
rint() patch:

postgres=# select to_timestamp(14864803242.312311);
 to_timestamp
---
 2441-01-17 09:00:42.312312+01
(1 row)

And this is the output with my patch:

postgres=# select to_timestamp(14864803242.312311);
 to_timestamp
---
 2441-01-17 09:00:42.312311+01
(1 row)


I don't know why you would get different output (that would be worrying in
itself).

In any case, I would prefer a to_timestamp(numeric), although I see no
reason not to make the float8 version as accurate as possible anyway.

-Erik



On Thu, Feb 9, 2017 at 3:56 PM, Tom Lane  wrote:

> =?UTF-8?Q?Erik_Nordstr=C3=B6m?=  writes:
> > Thanks for the insightful feedback. You are right, the patch does suffer
> > from overflow (and other possible issues when I think about it). Using
> > rint(), as you suggest, helps in my original example case, but there are
> > still cases when the output is not what you would expect. For instance,
> > converting the Unix time 14864803242.312311 gives back the timestamp
> > "2441-01-17 09:00:42.312312+01", even if using rint() (see below).
>
> Hm, that particular case works for me using the patch I committed
> yesterday (which just added a rint() call to the existing code).
> I'm a bit skeptical of the version you propose here because it seems
> mighty prone to subtractive cancellation.  You're basically computing
> x - int(x) which can't add any precision that wasn't there before.
> Looking at successive microsecond values in this range:
>
> regression=# select 14864803242.312310::float8 - 14864803242;
>  ?column?
> ---
>  0.312309265136719
> (1 row)
>
> regression=# select 14864803242.312311::float8 - 14864803242;
>  ?column?
> ---
>  0.312311172485352
> (1 row)
>
> regression=# select 14864803242.312312::float8 - 14864803242;
>  ?column?
> ---
>  0.312311172485352
> (1 row)
>
> regression=# select 14864803242.312313::float8 - 14864803242;
>  ?column?
> ---
>  0.312313079833984
> (1 row)
>
> Basically, 1 ULP in this range is more than 1 microsecond, so there are
> going to be places where you cannot get the right answer.  Reformulating
> the computation will just shift the errors to nearby values.  float8
> simply doesn't have enough bits to represent this many microseconds since
> 1970 exactly, and the problem's only going to get worse the further out
> you look.
>
> I think we might be better advised to add to_timestamp(numeric) alongside
> to_timestamp(float8).  There's plenty of precedent for that (e.g, exp(),
> ln()) so I would not expect problems with ambiguous function calls.
> It'd be slower though ...
>
> regards, tom lane
>


Re: [HACKERS] Patch: Avoid precision error in to_timestamp().

2017-02-09 Thread Erik Nordström
Hi Tom, and others,

Thanks for the insightful feedback. You are right, the patch does suffer
from overflow (and other possible issues when I think about it). Using
rint(), as you suggest, helps in my original example case, but there are
still cases when the output is not what you would expect. For instance,
converting the Unix time 14864803242.312311 gives back the timestamp
"2441-01-17 09:00:42.312312+01", even if using rint() (see below).

I guess precision-related errors are unavoidable when working with floating
point numbers (especially large ones). But I am wondering if it is not
desirable to avoid (or reduce) errors at least in the case when the input
value can be accurately represented to the microsecond by a float (i.e.,
when rounded to microsecond, as in printf)? For example, splitting up the
float into second and microsecond integers might help reduce errors,
especially with large numbers. Something like this:

ts_seconds = (int64)seconds;
ts_microseconds = (int64)rint((seconds - ts_seconds) * USECS_PER_SEC);
result = (ts_seconds - epoch_diff_seconds) * USECS_PER_SEC +
ts_microseconds;

Note that stripping of the full seconds before scaling the microsecond
fractional part will keep it more accurate, and it should solve the problem
for the case above, including overflow.

Or, maybe I am overthinking this and the only proper solution is to provide
a to_timestamp() that takes a microsecond integer? This certainly wouldn't
hurt in any case an makes sense since the timestamp is itself an integer
internally.

Anyway, I am attaching an updated version of the path. Below are some
example outputs (including min and max allowed input) from original
Postgres, with your rint() fix, and the included patch:

Original postgres:

test=# select to_timestamp(9224318015999);
  to_timestamp
-
 294277-01-01 00:59:58.999552+01
(1 row)

test=# select to_timestamp(-210866803200);
  to_timestamp
-
 4714-11-24 01:12:12+01:12:12 BC
(1 row)

test=# select to_timestamp(1486480324.236538);
 to_timestamp
---
 2017-02-07 16:12:04.236537+01
(1 row)

test=# select to_timestamp(14864803242.312311);
 to_timestamp
---
 2441-01-17 09:00:42.312312+01
(1 row)


With rint():

test=# select to_timestamp(9224318015999);
  to_timestamp
-
 294277-01-01 00:59:58.999552+01
(1 row)

test=# select to_timestamp(-210866803200);
  to_timestamp
-
 4714-11-24 01:12:12+01:12:12 BC
(1 row)

test=# select to_timestamp(1486480324.236538);
 to_timestamp
---
 2017-02-07 16:12:04.236538+01
(1 row)

test=# select to_timestamp(14864803242.312311);
 to_timestamp
---
 2441-01-17 09:00:42.312312+01
(1 row)


Included patch:

test=# select to_timestamp(9224318015999);
   to_timestamp
--
 294277-01-01 00:59:59+01
(1 row)

test=# select to_timestamp(-210866803200);
  to_timestamp
-
 4714-11-24 01:12:12+01:12:12 BC
(1 row)

test=# select to_timestamp(1486480324.236538);
 to_timestamp
---
 2017-02-07 16:12:04.236538+01
(1 row)

test=# select to_timestamp(14864803242.312311);
 to_timestamp
---
 2441-01-17 09:00:42.312311+01
(1 row)


--Erik


On Wed, Feb 8, 2017 at 9:52 PM, Tom Lane  wrote:

> I wrote:
> > I wonder if we could make things better just by using rint() rather than
> > a naive cast-to-integer.  The cast will truncate not round, and I think
> > that might be what's mostly biting you.  Does this help for you?
>
> > #ifdef HAVE_INT64_TIMESTAMP
> > - result = seconds * USECS_PER_SEC;
> > + result = rint(seconds * USECS_PER_SEC);
> > #else
>
> I poked around looking for possible similar issues elsewhere, and found
> that a substantial majority of the datetime-related code already uses
> rint() when trying to go from float to int representations.  As far as
> I can find, this function and make_interval() are the only ones that
> fail to do so.  So I'm now thinking that this is a clear oversight,
> and both those places need to be patched to use rint().
>
> regards, tom lane
>


to_timestamp_fix_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Avoid precision error in to_timestamp().

2017-02-09 Thread Tom Lane
=?UTF-8?Q?Erik_Nordstr=C3=B6m?=  writes:
> Thanks for the insightful feedback. You are right, the patch does suffer
> from overflow (and other possible issues when I think about it). Using
> rint(), as you suggest, helps in my original example case, but there are
> still cases when the output is not what you would expect. For instance,
> converting the Unix time 14864803242.312311 gives back the timestamp
> "2441-01-17 09:00:42.312312+01", even if using rint() (see below).

Hm, that particular case works for me using the patch I committed
yesterday (which just added a rint() call to the existing code).
I'm a bit skeptical of the version you propose here because it seems
mighty prone to subtractive cancellation.  You're basically computing
x - int(x) which can't add any precision that wasn't there before.
Looking at successive microsecond values in this range:

regression=# select 14864803242.312310::float8 - 14864803242;
 ?column?  
---
 0.312309265136719
(1 row)

regression=# select 14864803242.312311::float8 - 14864803242;
 ?column?  
---
 0.312311172485352
(1 row)

regression=# select 14864803242.312312::float8 - 14864803242;
 ?column?  
---
 0.312311172485352
(1 row)

regression=# select 14864803242.312313::float8 - 14864803242;
 ?column?  
---
 0.312313079833984
(1 row)

Basically, 1 ULP in this range is more than 1 microsecond, so there are
going to be places where you cannot get the right answer.  Reformulating
the computation will just shift the errors to nearby values.  float8
simply doesn't have enough bits to represent this many microseconds since
1970 exactly, and the problem's only going to get worse the further out
you look.

I think we might be better advised to add to_timestamp(numeric) alongside
to_timestamp(float8).  There's plenty of precedent for that (e.g, exp(),
ln()) so I would not expect problems with ambiguous function calls.
It'd be slower though ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Avoid precision error in to_timestamp().

2017-02-08 Thread Tom Lane
I wrote:
> I wonder if we could make things better just by using rint() rather than
> a naive cast-to-integer.  The cast will truncate not round, and I think
> that might be what's mostly biting you.  Does this help for you?

> #ifdef HAVE_INT64_TIMESTAMP
> - result = seconds * USECS_PER_SEC;
> + result = rint(seconds * USECS_PER_SEC);
> #else

I poked around looking for possible similar issues elsewhere, and found
that a substantial majority of the datetime-related code already uses
rint() when trying to go from float to int representations.  As far as
I can find, this function and make_interval() are the only ones that
fail to do so.  So I'm now thinking that this is a clear oversight,
and both those places need to be patched to use rint().

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Avoid precision error in to_timestamp().

2017-02-08 Thread Tom Lane
=?UTF-8?Q?Erik_Nordstr=C3=B6m?=  writes:
> I stumbled upon a precision issue with the to_timestamp() function that
> causes it to return unexpected timestamp values. For instance, the query
> SELECT to_timestamp(1486480176.236538) returns the timestamp "2017-02-07
> 16:09:36.236537+01", which is off by one microsecond. Looking at the source
> code, the issue seems to be that the conversion is unnecessarily done using
> imprecise floating point calculations. Since the target timestamp has
> microsecond precision, and is internally represented by a 64-bit integer
> (on modern platforms), it is better to first convert the given floating
> point value to a microsecond integer and then doing the epoch conversion,
> rather than doing the conversion using floating point and finally casting
> to an integer/timestamp.

This change would introduce overflow failures near the end of the range of
valid inputs.  Maybe it's worth doing anyway and we should just tighten
the range bound tests right above what you patched, but I'm a bit
skeptical.  Float inputs are going to be inherently imprecise anyhow.

I wonder if we could make things better just by using rint() rather than
a naive cast-to-integer.  The cast will truncate not round, and I think
that might be what's mostly biting you.  Does this help for you?

#ifdef HAVE_INT64_TIMESTAMP
-   result = seconds * USECS_PER_SEC;
+   result = rint(seconds * USECS_PER_SEC);
#else

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers