Re: [HACKERS] Re: Abbreviated keys for Numeric

2015-01-31 Thread Robert Haas
On Sat, Jan 31, 2015 at 7:07 PM, Peter Geoghegan  wrote:
> I don't want to get bogged down on this - the numeric abbreviation
> patch *is* still much more compelling - but maybe abbreviation of
> float8 isn't a red herring after all.

I'm completely on-board with doing something about numeric.  I think
it might be pretty foolish to try to do anything about any data type
the CPU has hard-wired knowledge of.  We're basically betting that we
can do better in software than they did in hardware, and even if that
happens to be true on some systems under some circumstances, it leaves
us in a poor position to leverage future improvements to the silicon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Abbreviated keys for Numeric

2015-01-31 Thread Peter Geoghegan
On Mon, Jan 26, 2015 at 3:35 PM, Peter Geoghegan  wrote:
> I am not seriously suggesting pursuing abbreviation for float8 in the
> near term - numeric is clearly what we should concentrate on. It's
> interesting that abbreviation of float8 could potentially make sense,
> though.

Note that in the IEEE 754 standard, the exponent does not have a sign.
Rather, an exponent bias is subtracted from it (127 for single
precision floats, and 1023 for double precision floats). This, and the
bit sequence of the mantissa allows floats to be compared and sorted
correctly even when interpreting them as integers. The exception is
NaN, but then we have an exception to that exception.

This is a really old idea, actually. I first saw it in a paper written
in the 1960s, long before math coprocessors became standard. Haven't
really thrashed this out enough, but I offhand I guess it would work.

The other problem is that positive IEEE floating-point numbers sort
like integers with the same bits, and negative IEEE floating-point
numbers sort in the reverse order of integers with the same bits. So
we'd probably end up with an encoding scheme that accounted for that,
and forget about tie-breakers (or have a NOOP "return 0" tie-breaker).
An example of the problem:

postgres=# create table foo (a float8);
CREATE TABLE
postgres=# insert into foo values (1), (2), (3), (-1), (-2), (-3);
INSERT 0 6
postgres=# select * from foo order by a;
 a

 -1
 -2
 -3
  1
  2
  3
(6 rows)

The reason that this conversion usually doesn't occur in library
sorting routines is because it only helps significantly on x86, has
additional memory overhead, and ordinarily requires that we convert
back when we're done sorting. The costs/benefit analysis for tuplesort
would be much more favorable than a generic float sorting case, given
that we pretty much have datum1 storage as a sunk costs anyway, and
given that we don't need to convert back the datum1 representation,
and given that the encoding process would be dirt cheap and occur at a
time when we were likely totally bottlenecked on memory bandwidth.

I don't want to get bogged down on this - the numeric abbreviation
patch *is* still much more compelling - but maybe abbreviation of
float8 isn't a red herring after all.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Abbreviated keys for Numeric

2015-01-27 Thread Gavin Flower

On 28/01/15 06:29, Andrew Gierth wrote:

"Peter" == Peter Geoghegan  writes:

  Peter> What I find particularly interesting about this patch is that it
  Peter> makes sorting numerics significantly faster than even sorting
  Peter> float8 values,

Played some more with this. Testing on some different gcc versions
showed that the results were not consistent between versions; the latest
I tried (4.9) showed float8 as somewhat faster, while 4.7 showed float8
as slightly slower; the difference was all in the time of the float8
case, the time for numeric was virtually the same.

For one specific test query, taking the best time of multiple runs,

float8:   gcc4.7 = 980ms, gcc4.9 = 833ms
numeric:  gcc4.7 = 940ms, gcc4.9 = 920ms

(vs. 650ms for bigint on either version)

So honestly I think abbreviation for float8 is a complete red herring.

Also, I couldn't get any detectable benefit from inlining
DatumGetFloat8, though I may have to play more with that to be certain
(above tests did not have any float8-related modifications at all, just
the datum and numeric abbrevs patches).

Since gcc5.0 is due to be released in less than 3 months, it might be 
worth testing with that.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Abbreviated keys for Numeric

2015-01-27 Thread Andrew Gierth
> "Peter" == Peter Geoghegan  writes:

 Peter> What I find particularly interesting about this patch is that it
 Peter> makes sorting numerics significantly faster than even sorting
 Peter> float8 values,

Played some more with this. Testing on some different gcc versions
showed that the results were not consistent between versions; the latest
I tried (4.9) showed float8 as somewhat faster, while 4.7 showed float8
as slightly slower; the difference was all in the time of the float8
case, the time for numeric was virtually the same.

For one specific test query, taking the best time of multiple runs,

float8:   gcc4.7 = 980ms, gcc4.9 = 833ms
numeric:  gcc4.7 = 940ms, gcc4.9 = 920ms

(vs. 650ms for bigint on either version)

So honestly I think abbreviation for float8 is a complete red herring.

Also, I couldn't get any detectable benefit from inlining
DatumGetFloat8, though I may have to play more with that to be certain
(above tests did not have any float8-related modifications at all, just
the datum and numeric abbrevs patches).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Abbreviated keys for Numeric

2015-01-26 Thread Petr Jelinek

On 27/01/15 00:51, Andres Freund wrote:

On 2015-01-26 15:35:44 -0800, Peter Geoghegan wrote:

On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth
 wrote:

Obvious overheads in float8 comparison include having to check for NaN,
and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces
a store/load to memory rather than just using a register. Looking at
those might be more beneficial than messing with abbreviations.


Aren't there issues with the alignment of double precision floating
point numbers on x86, too? Maybe my information there is at least
partially obsolete. But it seems we'd have to control for this to be
sure.


I think getting rid of the function call for DatumGetFloat8() would be
quite the win. On x86-64 the conversion then should amount to mov
%rd?,-0x8(%rsp);movsd -0x8(%rsp),%xmm0 - that's pretty cheap. Both
instructions have a cycle count of 1 + L1 access latency (4) + 2 because
they use the same exection port. So it's about 12 fully pipelineable
cycles. 2 if the pipeline can kept busy otherwise. I doubt that'd be
noticeable if the conversion were inlined.



IIRC the DatumGetFloat8 was quite visible in the perf when I was writing 
the array version of width_bucket. It was one of the motivations for 
making special float8 version since not having to call it had 
significant effect. Sadly I don't remember if it was the function call 
itself or the conversion anymore.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Abbreviated keys for Numeric

2015-01-26 Thread Andres Freund
On 2015-01-26 15:35:44 -0800, Peter Geoghegan wrote:
> On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth
>  wrote:
> > Obvious overheads in float8 comparison include having to check for NaN,
> > and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces
> > a store/load to memory rather than just using a register. Looking at
> > those might be more beneficial than messing with abbreviations.
> 
> Aren't there issues with the alignment of double precision floating
> point numbers on x86, too? Maybe my information there is at least
> partially obsolete. But it seems we'd have to control for this to be
> sure.

I think getting rid of the function call for DatumGetFloat8() would be
quite the win. On x86-64 the conversion then should amount to mov
%rd?,-0x8(%rsp);movsd -0x8(%rsp),%xmm0 - that's pretty cheap. Both
instructions have a cycle count of 1 + L1 access latency (4) + 2 because
they use the same exection port. So it's about 12 fully pipelineable
cycles. 2 if the pipeline can kept busy otherwise. I doubt that'd be
noticeable if the conversion were inlined.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Abbreviated keys for Numeric

2015-01-26 Thread Peter Geoghegan
On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth
 wrote:
> Obvious overheads in float8 comparison include having to check for NaN,
> and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces
> a store/load to memory rather than just using a register. Looking at
> those might be more beneficial than messing with abbreviations.

Aren't there issues with the alignment of double precision floating
point numbers on x86, too? Maybe my information there is at least
partially obsolete. But it seems we'd have to control for this to be
sure.

I am not seriously suggesting pursuing abbreviation for float8 in the
near term - numeric is clearly what we should concentrate on. It's
interesting that abbreviation of float8 could potentially make sense,
though.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Abbreviated keys for Numeric

2015-01-26 Thread Andrew Gierth
> "Peter" == Peter Geoghegan  writes:

 Peter> What I find particularly interesting about this patch is that it
 Peter> makes sorting numerics significantly faster than even sorting
 Peter> float8 values,

I get a much smaller difference there than you do.

Obvious overheads in float8 comparison include having to check for NaN,
and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces
a store/load to memory rather than just using a register. Looking at
those might be more beneficial than messing with abbreviations.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Abbreviated keys for Numeric (was: Re: B-Tree support function number 3 (strxfrm() optimization))

2015-01-26 Thread Peter Geoghegan
On Mon, Jan 26, 2015 at 8:43 AM, Andrew Gierth
 wrote:
> Another spinoff from the abbreviation discussion. Peter Geoghegan
> suggested on IRC that numeric would benefit from abbreviation, and
> indeed it does (in some cases by a factor of about 6-7x or more, because
> numeric comparison is no speed demon).

Cool.

What I find particularly interesting about this patch is that it makes
sorting numerics significantly faster than even sorting float8 values,
at least some of the time, even though the latter has generic
SortSupport (for fmgr elision). Example:

postgres=# create table foo as select x::float8 x, x::numeric y from
(select random() * 1000 x from generate_series(1,100) a) b;
SELECT 100

This query takes about 525ms after repeated executions:  select *
from (select * from foo order by x offset 10) i;

However, this query takes about 412ms:
select * from (select * from foo order by y offset 10) i;

There is probably a good case to be made for float8 abbreviation
supportjust as well that your datum abbreviation patch doesn't
imply that pass-by-value types cannot be abbreviated across the board
(it only implies that abbreviation of pass-by-value types is not
supported in the datum sort case).:-)

Anyway, the second query above (the one with the numeric ORDER BY
column) is enormously faster than the same query executed against
master's tip. That takes about 1720ms following repeated executions.
So at least that case is over 4x faster, suggesting that abbreviation
support for numeric is well worthwhile. So I'm signed up to review
this one too.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers